Sage Journals: Discover world-class research

Abstract

Background

In Japan, consumers can purchase most over-the-counter (OTC) drugs without pharmacist guidance. Recently, generative artificial intelligence (AI) has become increasingly popular. Therefore, medical professionals need to consider the use of generative AI by consumers for medication counseling. We have previously reported responses in Japanese from ChatGPT-3.5 to 264 questions regarding whether each of 22 OTC drugs can be taken under 12 typical patient conditions. The proportion of responses that satisfied the criteria of 1) accuracy, 2) relevance, and 3) reliability with respect to package insert instructions was 20.8%. In November 2023, GPTs were launched, enabling us to construct a customized ChatGPT, using natural language. In the present study, we compared performance in providing medication guidance among a newly customized GPT, the latest non-customized version ChatGPT-4o, and the previous version, ChatGPT-3.5. The aim was to determine whether the customization and version update of ChatGPT improved performance and to evaluate its potential usefulness.

Methods

We configured customized ChatGPT-4 by executing five instructions in Japanese and uploaded the text of package inserts for 22 OTC drugs as knowledge. We asked the same 264 questions as in our previous study.

Results

With the customized ChatGPT-4, the percentages of responses that satisfied the criteria of accuracy, relevance, and reliability were 93.2%, 100%, and 60.2%, respectively. Additionally, 56.1% of responses satisfied all three criteria, 2.7-fold higher compared with ChatGPT-3.5 and 1.3-fold higher compared with ChatGPT-4o.

Conclusion

The performance of our customized GPT far exceeded that of ChatGPT-3.5. In particular, the proportion of appropriate responses to the questions using brand names was significantly improved. ChatGPT can be customized by providing drug package insert information and using appropriate prompt engineering, potentially offering helpful tools in clinical pharmacy.

Keywords

Drug package insert generative artificial intelligence GPTs customized ChatGPT over-the-counter drugs

Introduction

With the popularization of the Internet, many people now gather information about health and medications through web searches. To date, a survey of approximately 10,000 people over the age of 40 who use medications found that 47% have used a search engine such as Google to search for health-related information.¹ Of 400 adults in the United States, 75% have searched the Internet for health-related information.² A survey of 161 pharmacists found that 80% had received Internet-based medication-related inquiries from patients.³

In Japan, consumers can purchase two classes of drugs at a drugstore or pharmacy without a prescription. One class, pharmaceuticals requiring guidance by a pharmacist, can be purchased only after a face-to-face consultation with a pharmacist. The other class, over-the-counter (OTC) drugs, is further classified into categories 1, 2, and 3. Category 1 can be sold via the Internet with consultation by a pharmacist, and categories 2 and 3 can be sold without a pharmacist's consultation.⁴ Therefore, consumers may rely on the Internet, rather than a pharmacist, for advice and information on OTC drugs. Since the release of ChatGPT-3.5 in November 2022, generative artificial intelligence (AI) has become increasingly popular. Because generative AI excels at generating text and answering questions,^5–7 consumers may turn to these AI tools for OTC drug consultations. We have previously analyzed the quality of responses by ChatGPT-3.5 to putative consultation for OTC drugs in Japanese.⁸ We selected 22 popular OTC drugs and 12 common consumer conditions, combining them to create 264 putative questions, and analyzed ChatGPT-3.5's responses to these questions. The obtained responses were evaluated based on the following three criteria: 1) accuracy, 2) relevance, and 3) reliability of the instructed actions. However, only 20.8% of the 264 questions satisfied all three criteria. Huang et al. reported more favorable results by comparing the performance of ChatGPT-3.5 with clinical pharmacists. Their results showed that ChatGPT-3.5 had similar response scores to clinical pharmacists for drug counseling (ChatGPT vs. clinical pharmacist: 8.77 vs. 9.50 out of 10 points, p = 0.0791) but it was inferior in terms of adverse drug reaction causality assessment (4.03 vs. 9.73, p = 0.023), adverse drug reaction recognition (5.07 vs. 9.70, p = 0.0483), prescription review (5.23 vs. 9.90, p = 0.0089), and patient medication education (6.20 vs. 9.07, p = 0.0032).⁹

In March 2023, ChatGPT-4, a newer version of ChatGPT with enhanced text-generation and question-answering capabilities, was released. The performance of ChatGPT-4 was reported to be sufficient to pass the national pharmacist examination in Japan, with an accurate response rate of 72.5%.¹⁰ However, the use of generative AI in the medical field remains questionable due to concerns such as “hallucinations”, which occur when an AI generates inaccurate answers or references to non-existent resources and may potentially harm patients. More recently, in November 2023, OpenAI released GPTs, which allow users to create custom versions of ChatGPT, using natural language without the need for coding. A customized GPT has the capability to handle specific tasks based on the particular knowledge and instructions provided by users. OpenAI states on its official website that these customized GPTs can be used for various purposes, such as learning the rules of board games or assisting in teaching children's math (https://openai.com/index/introducing-gpts/). This rapid development of text-generating AI suggests that applications in clinical pharmacy will be possible in the near future. In the present study, we constructed a customized ChatGPT-4 (cGPT) with package insert information for OTC drugs. We employed the same putative questions and evaluation methods as in our previous study in order to compare the performance of cGPT with that of ChatGPT-3.5. We also compared it with ChatGPT-4o, the latest version. The purpose of this study was to assess whether cGPT can be utilized as an aid in clinical pharmacy, particularly in medication counseling and self-medication.

Methods

Selection of OTC drugs

We selected 22 common medications sold as OTC drugs in Japan, consistent with our previous study.⁸ To create the questions, we used the generic name for 10 of the 22 drugs and the brand name for the remaining 12 drugs (Table 1). Eighteen of the 22 drugs belong to category 2. These drugs have a relatively higher risk, and safety information should be provided in order to prevent rare but possible severe adverse reactions that might require hospitalization.¹¹ Additionally, most of the active ingredients in OTC drugs sold in Japan are categorized as category 2.¹²

Table 1.

List of 22 OTC drugs.

Category	Drug names
1	Famotidine
1	Loxoprofen Sodium Hydrate
1	Loxoprofen Sodium Hydrate, Dried Aluminum Hydroxide Gel
2	Acetaminophen
2	Bisacozyl
2	Cetirizine Hydrochloride
2	Diphenhydramine Hydrochloride
2	Fexofenadine Hydrochloride
2	Ibuprofen
2	Loxoprofen Sodium Hydrate (tape)
2	Nicotine (gum)
2	Triamcinolone Acetonide (adhesive tablet)
2	Chlorhexidine Hydrochloride, Chlorpheniramine Maleate, Hydrocortisone Acetate, Lidocaine, Tetrahydrozoline Hydrochloride, Allantoin, l-Menthol, and Tocopherol Acetate (ointment)
2	Codeine Phosphate Hydrate, Chlorpheniramine Maleate, dl-Methylephedrine, Anhydrous Caffeine, and Senega
2	Indomethacin, l-Menthol (gel)
2	Kakkonto (Cinnamon Bark, Ephedra Herb, Ginger, Glycyrrhiza, Jujube, Peony Root, and Pueraria Root)
2	Magnesium Aluminosilicate, Scopolia Extract, Sodium Bicarbonate, Aldioxa, Biodiastase2000, LipaseAP12, Powdered Cinnamon Bark, Powdered Clove, Powdered Ginseng, Powdered Glycyrrhiza, Powdered Magnolia Bark, Powdered Turmeric, and Prozyme6
2	Naphazoline Hydrochloride, Chlorpheniramine Maleate, and Benzalkonium Chloride (spray)
2	Royal Jelly, Anhydrous Caffeine, Epimedium Herb Fluid Extract, Ginseng Fluid Extract, Nicotinamide, Vitamin B1, B2, and B6
2	Synthetic Aluminum Silicate, precipitated Calcium Carbonate, Magnesium Carbonate, Sodium bicarbonate, Biodiastase, Cinnamon Bark, Citrus Unshiu Peel, Clove, Fennel, Gentian, Nutmeg, and Powdered Picrasma Wood
2	Terbinafine Hydrochloride, Crotamiton, Glycyrrhetinic Acid, l-Menthol, and Urea (ointment)
3	Anhydrous Caffeine

Category 1: Pharmacists are required to counsel the consumer due to high risk.

Category 2: Pharmacist counseling is not required and the medication can be sold by registered sales clerks as well as pharmacists; however, these drugs have relatively high risk (rare possible severe adverse reactions that may require hospitalization).

Category 3: Sales procedure is the same as category 2; however, these have relatively low risk.

The underlined term indicates the name posted to ChatGPT as a generic name. Drugs without underlining were entered using the brand names sold in Japan. All drug names were entered in Japanese. Dosage forms are shown in parentheses. Drugs without parentheses are oral drugs.

Customization of GPTs

Creation of GPTs

GPT creation is a feature available exclusively to paid users. Free users have access to publicly available GPTs, though with certain limitations. The standard ChatGPT webpage can be accessed at https://chatgpt.com/. In contrast, cGPT creation is conducted through a different interface, specifically at https://chatgpt.com/gpts/editor, rather than the standard ChatGPT webpage. The first step to create a cGPT was to determine its name and icon. This was followed by providing additional instructions on how the cGPT should respond to prompts. Next, knowledge was uploaded, in a PDF format in our case. Finally, we enabled the cGPT to access it via the web.

Guidelines for configuring the cGPT

The cGPT was configured in Japanese, using the following guidelines:

#1. The response to the question should be one of three options: “Contraindicated,” “Consult a medical professional,” or “Allowed.”

#2. Avoid ambiguous responses, such as adding “However, you should consult a specialist or doctor”

#3. Responses should be provided step-by-step.

#4. If appropriate knowledge is not found in the uploaded file, the response should be generated using ChatGPT-4 while adhering to guidelines #1, #2, and #3.

#5. All responses should be in Japanese.

Uploading package insert information to cGPT

The package inserts for 22 OTC drugs provided by Pharmaceuticals and Medical Devices Agency (PMDA) as of March 2024 were used in HTML format. The HTML files of package inserts in Japanese were imported into Microsoft Excel using the web query function, reformatted to the first normal form, and saved as comma-separated values (CSV) files. These CSV files were subsequently merged to create a single PDF file, which was uploaded to the cGPT. This step was necessary because GPTs allow file uploads of up to 20 files. This operation was performed simultaneously with the customization process.

Other settings

Web browsing and code-interpreter functions were enabled.

Response evaluation

We selected 12 consumer characteristics, including consumer background (pregnancy, elderly, and driving), medical conditions (glaucoma, gastric ulcer, hyper blood tension, hemodialysis, and past history of asthma), and concomitant medications (antihistamines, motion sickness drugs, cough medicines, and pain relievers), as in our previous study. We selected these three consumer backgrounds and five disease conditions because they are frequently mentioned or featured in alerts in the package inserts of OTC drugs in Japan. For concomitant medications, we chose those most commonly used as OTC drugs in Japan. A total of 264 questions (i.e., all combinations of 22 OTC drugs and 12 characteristics) were input into the cGPT in Japanese, asking whether it is appropriate for consumers with each characteristic to take these OTC drugs. The question format was standardized as follows: “I am/have < CHARACTERISTIC > . Can I take < DRUG NAME>?”

The responses were evaluated based on three criteria: 1) accuracy, 2) relevance, and 3) reliability of the instructed actions.

The first two criteria were assessed with “yes” or “no” responses. Accuracy refers the scientific correctness of the answer. Relevance refers to whether the answer logically addresses the question. Reliability is defined by whether the instructions in the answer are consistent with those in the package insert. The evaluation criteria for accuracy, relevance, and reliability are identical to those for correctness, coherence, and appropriateness in our previous study, respectively.⁸

The instructed actions of ChatGPT and the package inserts were categorized into three groups: “allowed” (no special precautions or consultation required), “requires consultation (with a medical professional),” and “contraindicated.” Responses were considered “reliable” if the instructed actions from cGPT matched the recommendations in the package insert; otherwise, they were deemed “unreliable.” For response evaluation, two pharmacists independently evaluated the responses from the cGPT and ChatGPT-4o, and agreement between their assessments was verified using the κ coefficient. In the case of discrepancies, consensus was reached through discussion and reconciliation.

Evaluation metrics included the proportion of appropriate responses satisfying each criterion as well as the proportion of responses satisfying all three criteria. In addition, we compared the proportion of appropriate responses to questions using generic versus those using brand names. To assess reproducibility, questions that satisfied all three criteria were validated a second time on a separate day. The first trial was conducted from April 2 to April 9, 2024, for the cGPT and from November 15 to November 20, 2024, for ChatGPT-4o. The second trial was conducted from May 2 to May 21, 2024, for the cGPT and from November 25 to December 6, 2024, for ChatGPT-4o.

The results were compared with those from our previous study involving ChatGPT-3.5⁸ and with the results from ChatGPT-4o.

Error analysis

Error analysis was conducted for questions where the responses from the cGPT did not align with the package insert instructions. The cGPT responses were configured to prioritize knowledge-based input. In cases where information could not be retrieved from the knowledge source, the response explicitly stated that the information was unavailable. This setup allows for differentiation from responses generated using web browsing. The error analysis categorized the issues as follows based on the content of the responses:

Step 0: Unknown reasons because the response was not step by step.

Step 1: cGPT failed to load knowledge (response relied on web browsing).

Step 2: cGPT successfully loaded knowledge but inaccurately analyzed it.

Step 3: cGPT accurately analyzed knowledge but provided incorrect instructions.

Statistical analysis

Statistical differences in the probabilities of occurrence between ChatGPT-3.5 or ChatGPT-4o and cGPT were assessed using Fisher's exact test. A p value of less than 0.025 was considered statistically significant after applying Bonferroni correction for multiple comparisons. The McNemar-Bowker test was used to compare instructed actions between cGPT, ChatGPT-4o and package inserts. All statistical analyses were performed using SPSS Statistics ver. 29 (IBM Corp., Armonk, NY).

Results

The κ coefficient was 0.95 for the cGPT and 0.91 for ChatGPT-4o, indicating almost perfect agreement between the two evaluators. Figure 1 shows the performance comparison among the cGPT, ChatGPT-3.5, and ChatGPT-4o. Out of 264 questions, the number of appropriate responses in terms of the accuracy, relevance, and reliability of the instructed actions from cGPT was 246 (93.2%), 264 (100%), and 159 (60.2%), respectively. The performance of the cGPT showed significant improvements in each of the three criteria compared with ChatGPT-3.5. Additionally, the number of questions that satisfied all three criteria was 148 (56.1%), which was significantly higher (2.7-fold) compared with ChatGPT-3.5 (p < 0.001). The accuracy of cGPT was 1.3-fold higher than that of ChatGPT-4o. The proportion of responses that satisfied all three criteria was also 1.3-fold higher than that of ChatGPT-4o.

Figure 1.

Proportions of responses from ChatGPT-3.5, ChatGPT-4o and cGPT that satisfied each and all criteria. n = 264. cGPT outperformed ChatGPT-3.5 in accuracy and relevance by 1.7-fold and 1.3-fold, respectively, and the number of questions that satisfied all three criteria was 2.7-fold higher with the cGPT. cGPT was higher than ChatGPT-4o in accuracy by 1.3-fold, and the number of questions that satisfied all three criteria was 1.3-fold higher with the cGPT. Fisher's exact test (Bonferroni correction), *: p < 0.025, **: p < 0.01, ***: p < 0.001. ns: not significant, cGPT: customized GPT.

Figure 2 shows the comparison of responses to questions using generic names (n = 120) with those using brand names (n = 144) for each criterion. Regardless of whether questions used generic or brand names, cGPT demonstrated improved performance across all criteria compared with ChatGPT-3.5. Specifically, although accuracy and relevance showed significant improvement, the improvement in reliability was not statistically significant (p = 0.101 for generic names, p = 0.323 for brand names). On the other hand, cGPT showed a significant improvement (1.4-fold) in the proportion of accurate responses for brand names compared with ChatGPT-4o (p < 0.001). The proportion of responses that satisfied all three criteria increased significantly from 34 (28.3%) with ChatGPT-3.5 to 72 (60.0%) with the cGPT for questions using generic names, and from 21 (14.6%) to 76 (52.8%) (Figure 2(d)) for those using brand names.

Figure 2.

Proportions of responses that satisfied each and all criteria when generic and brand names were used. generic name (n = 120), brand name (n = 144). A: accuracy; B: relevance; C: reliability; D: satisfied all three criteria. Performance in terms of accuracy for question using brand names on ChatGPT-3.5 and ChatGPT-4o was low but improved on the cGPT by 2.0-fold and 1.4-fold, respectively. Fisher's exact test (Bonferroni correction), **: p < 0.01, ***: p < 0.001, ns: not significant.

Table 2 compares the instructed actions between the cGPT, ChatGPT-4o and the package inserts. Whereas our previous and present study had one problematic case in which ChatGPT-3.5 and ChatGPT-4o allowed the use of a drug under a condition contraindicated by the package insert, the cGPT exhibited no such cases. The instructions for action by the cGPT were consistent with the package inserts in 30.8% of the “allowed” cases, 89.7% of the “consultation required” cases, and 53.7% of “contraindicated” cases. For both cGPT and ChatGPT-4o, the distribution of discrepancies was significantly biased (p < 0.001). Namely, among the drugs listed as “allowed” in the package inserts, cGPT and ChatGPT-4o generated significantly more responses suggesting “consultation required.” In ChatGPT-4o, only 7.3% were consistent with the contraindications in the package inserts, while cGPT was 53.7% consistent, a 7.4-fold improvement. Additionally, ChatGPT-4o misclassified 90.2% of the contraindicated drugs as requiring consultation, while cGPT misclassified 46.3% as such. On the other hand, there were five cases in which the cGPT advised against use (contraindicated) even though the package inserts allowed it: 1) famotidine while driving a car, 2) kakkonto while driving a car, 3) concomitant use of ibuprofen and a motion sickness drug, 4) concomitant use of caffeine and a cough drug, and 5) concomitant use of kakkonto and a cough drug. Additionally, there were eight cases in which the cGPT allowed use when the package inserted indicated that consultation was required. Of these, five cases involved questions regarding whether patients with hypertension could use bisacozyl, codeine/methylephedrine/chlorpheniramine, loxoprofen tape, triamcinolone adhesive tablets, and terbinafine cream.

Table 2.

Comparison of instructed actions between cGPT, ChatGPT-4o and package inserts.

	Package inserts n (%)
		Allowed	Requires consultation	Contraindicated	Total
cGPT	Allowed	33 (30.8)	8 (6.9)	0 (0)	41
	Requires consultation	69 (64.5)	104 (89.7)	19 (46.3)	192
	Contraindicated	5 (4.7)	4 (3.4)	22 (53.7)	31
	Total	107	116	41	264
ChatGPT-4o	Allowed	47 (43.9)	11 (9.5)	1 (2.4)	59
	Requires consultation	55 (51.4)	104 (89.7)	37 (90.2)	196
	Contraindicated	4 (3.7)	1 (0.9)	3 (7.3)	8
	Others	1 (0.9)	0 (0)	0 (0)	1
	Total	107	116	41	264

cGPT: customized GPT.

Figure 3 shows the results of the error analysis obtained step by step. Although the knowledge was correctly analyzed in 24.2% of cases, the final instructions did not align with the package insert. This occurred because, despite correctly analyzing the absence of precautions or contraindications in the package insert, the responses from the cGPT were overly cautious, often recommending consultation rather than allowing action based solely on the analysis. In 8.0% of cases, information not mentioned in the package insert was generated or the data were not analyzed accurately (Supplementary Table 1). Additionally, in 8.7% of cases, the system could not locate the relevant document within the knowledge base. The 3.0% categorized as “unknown” primarily consisted of cases where the system could not respond step by step, making error analysis infeasible.

Figure 3.

Error analysis using cGPT's decision-making process. Step 0 indicates cases where cGPT could not respond step by step. In Step 1, cGPT loads knowledge. In Step 2, the knowledge is accurately analyzed. In step 3, cGPT provides instructions for action. Percentages represent the proportion of all 264 questions.

Figure 4 shows the result of the second survey to confirm the reproducibility of the cGPT. Of the 148 questions that satisfied all three criteria in the first survey using the cGPT, 107 (72.3%) also satisfied all three criteria in the second survey. This reproducibility of the cGPT was higher than that of ChatGPT-3.5, although the difference was not statistically significant. In contrast, the reproducibility showed a significant improvement by 1.3-fold for the cGPT compared with ChatGPT-4o.

Figure 4.

Comparison of the reproducibility in ChatGPT-3.5, ChatGPT-4o, and cGPT. For the cases that satisfied all three criteria in the first survey, the proportion of cases that also satisfied all three criteria in the second survey is shown. The reproducibility of cGPT was higher than that of ChatGPT-3.5 and ChatGPT-4o by 1.2-fold and 1.3-fold, respectively. Fisher's exact test (Bonferroni correction), *: p < 0.025, ns: not significant, cGPT: customized GPT.

Discussion

The proportion of appropriate responses from the cGPT was superior to that from ChatGPT-3.5. The cGPT also tended to provide more appropriate responses compared with ChatGPT-4o, demonstrating the potential application of generative AI in OTC drug counseling.

This improvement in performance may be attributed, at least in part, to the ability of GPTs to properly learn the information necessary for specific counseling scenarios. The improved performance of the cGPT may be attributable to both ChatGPT's improvements and the available knowledge (i.e., package insert information). Therefore, it is important to compare performance with and without the inclusion of additional knowledge. For instance, in our previous study using ChatGPT-3.5, we observed that the proportion of appropriate responses to questions using brand names was lower than that for questions using generic names. This was likely due to ChatGPT-3.5 lacking sufficient knowledge of brand names. In contrast, in the present study, the inclusion of package insert information as knowledge appears to have improved responses to the questions using brand names to a level comparable to that of questions using generic names, even when improvements in ChatGPT-4o are also considered. Notably, the proportion of responses that satisfied all three criteria increased by 3.6-fold compared with ChatGPT-3.5, indicating a remarkable improvement in performance. This feature is particularly important because consumers are more likely to ask about OTC drugs using brand names than generic names.

Although the cGPT exhibited increased performance across all three criteria, the improvement in instructed action was relatively modest. However, improving the consistency of contraindications from 7.3% to 53.7% indicates the potential feasibility of utilizing cGPT in drug safety assessments. For 107 cases in which the package insert allowed use, cGPT advised consultation or specified contraindication in 74 (69.2%) cases. However, this does not necessarily imply that all the cGPT responses were inaccurate. For instance, Royal Jelly/Ginseng Fluid extract is a nutritional drink containing 50 mg of caffeine anhydrous per bottle, and its package insert does not list any drug interactions, including those with caffeine-containing drugs. Similarly, the package insert of caffeine tablets, an OTC drug used to suppress drowsiness, only advises against taking drugs within the same pharmacological category. Given this context, the cautionary response of the cGPT regarding the concomitant use of Royal Jelly/Ginseng Fluid extract or caffeine tablets with other caffeine-containing drugs is reasonable and should not be considered inaccurate. Thus, some instructions provided by cGPT may offer useful information that extends beyond what is stated in the package insert. These responses, especially when generic names are used in the questions, may have been influenced by information from overseas sources, as we did not restrict the question format to situations in Japan.

Responses generated by AI may include hallucinations, which can be particularly problematic when applied to healthcare practices.¹³ Compared with ChatGPT-3.5, the cGPT in the present study exhibited a significantly reduced number of hallucinations, as assessed by the number of obvious scientific errors. More than 90% of the responses were accurate and relevant. Although the responsibility for this has traditionally fallen on the creator of each customized GPT, Open AI published key guidelines in May 2024 aimed at enhancing the reliability and accuracy of GPT construction.¹⁴ These guidelines comprise six key principles: 1. simplify complex instructions; 2. structure for clarity; 3. promote attention to detail; 4. avoid negative instructions; 5. granular steps; and 6. consistency and clarity. Checking the correspondence of the procedure to customize GPT in the present study with these guidelines, we avoided negative expressions and gave simple instructions (#1 and #4). Additionally, we ensured attention to detail (#3) by requiring step-by-step responses. Our implementation of these principles likely contributed to the reduction of hallucinations. However, they were not entirely eliminated, likely due to failures to accurately analyze the uploaded package inserts. For example, cGPT inaccurately stated that “the package insert of kakkonto instructs avoiding driving a car after taking it,” when, in fact, no such instruction exists in the package insert. The instructions provided during the construction of GPTs are crucial in preventing the hallucinations.

Generative AI responses may not always be reproducible. In the present study, the proportion of responses that satisfied all three criteria in the second survey did not show a significant improvement compared with ChatGPT-3.5. To enhance reproducibility, it may be necessary to provide examples of clear and consistent responses to questions during the construction of GPTs. Additionally, providing instructions to perform specific actions, such as using the code interpreter to analyze the files when answering questions, may help to increase reproducibility. In addition to reducing hallucinations, following the key guidelines described above might assist in constructing GPTs that produce more accurate and consistent responses with sufficient reproducibility.

To our knowledge, no other studies have evaluated the consultation performance of ChatGPT customized with information from drug package inserts. In the field of ophthalmology, Sevgi et al. recently suggested that when customized with guidelines for diabetic retinopathy and angle closure glaucoma, ChatGPT might be useful for providing medical education and supporting clinical decision-making.¹⁵ Similarly, Gorelik et al. customized ChatGPT with guidelines for pancreatic cysts and reported that 87% of its responses to clinical scenarios were in agreement with gastroenterologists’ recommendations.¹⁶ In the present study, we successfully demonstrated the usefulness of ChatGPT customized by uploading information from drug package inserts, which are the most common and basic sources of drug information, suggesting the potential usefulness of generative AI in the field of pharmacy.

Web searches, such as those provided by Google, require users to extract and synthesize information from the search results based on their own judgment. In contrast, cGPT provides direct judgment results based on its knowledge. Therefore, the users do not need to select or make judgments about information. However, if the knowledge base of cGPT is not up-to-date, web searches may be advantageous for providing more current information. Additionally, cGPT carries the risk of hallucinations, while web searches also have the potential to yield inaccurate information. In either case, the ultimate responsibility for decision-making lies with the user. The cGPT in this study shows a notable advantage in that it can be developed with natural language inputs, without requiring coding skill. However, for applications requiring real-time information retrieval, the use of Retrieval-Augmented Generation (RAG) may be more appropriate. Nevertheless, implementing RAG typically requires a higher level of technical proficiency, including programming skills, and a substantial investment of time and resources.

This study has some limitations. The first is that the results were compared with those obtained using ChatGPT-3.5 and ChatGPT-4o and not with the responses of clinical pharmacists. Therefore, to estimate the potential for cGPTs to take on the role of clinical pharmacists, further studies are needed to compare the performance of cGPTs with that of clinical pharmacists. Additionally, the selection of the 22 OTC drugs and 12 consumer conditions considered in this study was arbitrary, and different results might have been obtained if other drugs and/or consumer conditions had been targeted. However, this study does cover common medications such as caffeine, diphenhydramine, and antipyretic analgesics, which have been reported as major causes of OTC drug overdoses in Japan.¹⁷ Therefore, the results are considered to reflect, to a certain extent, scenarios in which medication consultations were sought using ChatGPT, particularly regarding the proper use of problematic OTC drugs. In the future, generalizability could be improved by understanding the types of drug-related questions that consumers intend to input into generative AI and by building and evaluating cGPT based on those questions.

Conclusion

GPTs customized with information from drug package inserts using appropriate instruction were demonstrated to be a potentially useful drug information tool for consumers and patients, suggesting the potential usefulness of GPTs in the field of pharmacy. The study showed that the rapid updating of ChatGPT suggests its future potential to assist in providing medication counseling by applying appropriate knowledge, improved prompt engineering, and analyzing patterns in patients’ questions.

Supplemental Material

sj-xlsx-1-dhj-10.1177_20552076251323810 - Supplemental material for Medication counseling for OTC drugs using customized ChatGPT-4: Comparison with ChatGPT-3.5 and ChatGPT-4o

Supplemental material, sj-xlsx-1-dhj-10.1177_20552076251323810 for Medication counseling for OTC drugs using customized ChatGPT-4: Comparison with ChatGPT-3.5 and ChatGPT-4o by Keisuke Kiyomiya, Tohru Aomori and Hisakazu Ohtani in DIGITAL HEALTH

Supplemental Material

sj-docx-2-dhj-10.1177_20552076251323810 - Supplemental material for Medication counseling for OTC drugs using customized ChatGPT-4: Comparison with ChatGPT-3.5 and ChatGPT-4o

Supplemental material, sj-docx-2-dhj-10.1177_20552076251323810 for Medication counseling for OTC drugs using customized ChatGPT-4: Comparison with ChatGPT-3.5 and ChatGPT-4o by Keisuke Kiyomiya, Tohru Aomori and Hisakazu Ohtani in DIGITAL HEALTH

Footnotes

Author contributions

Conceptualization: OH. Data curation: KK. Formal analysis: KK and AT. Investigation: KK and AT. Methodology: OH and KK. Writing – original draft: KK. Writing – review & editing: AT and OH. All authors have read and agreed to the published version of the manuscript.

Data availability

The data underlying the results of this article will be shared by the corresponding author upon reasonable request.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethical considerations

This article does not contain any data from humans.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Keisuke Kiyomiya

Hisakazu Ohtani

Supplemental material

Supplemental material for this article is available online.

References

Walquist

Svendsen

Garcia

, et al. Self-reported medication information needs among medication users in a general population aged 40 years and above - the tromsø study. BMC Public Health 2022; 22: 2182.

Madrigal

Escoffery

. Electronic health behaviors among US adults with chronic disease: cross-sectional survey. J Med Internet Res 2019; 21: e11240.

Alefishat

Abu Farha

Zawiah

. Pharmacists’ perceptions of the use of internet-based medication information by patients: a cross-sectional survey. PLoS One 2021; 16: e0256031.

Nakai

Tanaka

. Introducing internet retailing of OTC drugs in Japan: revision of the pharmaceutical affairs law. Ther Innov Regul Sci 2015; 49: 284–288.

van Dis

EAM

Bollen

Zuidema

, et al. ChatGPT: five priorities for research. Nature 2023; 614: 224–226.

Kunze

Jang

Fullerton

, et al.

What's all the chatter about?

Bone Joint J 2023; 105-b: 587–589.

Gilson

Safranek

Huang

, et al. How does ChatGPT perform on the United States medical licensing examination (USMLE)? the implications of large language models for medical education and knowledge assessment. JMIR Med Educ 2023; 9: e45312.

Kiyomiya

Aomori

Ohtani

. Comprehensive analysis of responses from ChatGPT to consumer inquiries regarding over-the-counter medications. Pharmazie 2024; 79: 24–28.

Huang

Estau

Liu

, et al. Evaluating the performance of ChatGPT in clinical pharmacy: a comparative study of ChatGPT and clinical pharmacists. Br J Clin Pharmacol 2024; 90: 232–238.

10.

Sato

Ogasawara

. ChatGPT (GPT-4) passed the Japanese National License Examination for Pharmacists in 2022, answering all items including those with diagrams: a descriptive study. J Educ Eval Health Prof 2024; 21: 4.

11.

Nomura

Kitagawa

Yuda

, et al. Medicine reclassification processes and regulations for proper use of over-the-counter self-care medicines in Japan. Risk Manag Health Policy 2016; 9: 173–183.

12.

Classification of a medical product, https://translation.mhlw.go.jp/LUCMHLW/ns/tl.cgi/https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/0000082514.html?SLANG=ja&TLANG=en&XMODE=0&XCHARSET=utf-8&XJSID=0 (2024, accessed 24 July 2024).

13.

Chelli

Descamps

Lavoué

, et al. Hallucination rates and reference accuracy of ChatGPT and bard for systematic reviews: comparative analysis. J Med Internet Res 2024; 26: e53164.

14.

Key guidelines for writing instructions for custom GPTs, https://help.openai.com/en/articles/9358033-key-guidelines-for-writing-instructions-for-custom-gpts (2024, accessed 4 June 2024).

15.

Sevgi

Antaki

Keane

. Medical education with large language models in ophthalmology: custom instructions and enhanced retrieval capabilities. Br J Ophthalmol 2024. DOI: https://doi.org/10.1136/bjo-2023-325046

16.

Gorelik

Ghersin

Arraf

, et al. Using a customized GPT to provide guideline-based recommendations for management of pancreatic cystic lesions. Endosc Int Open 2024; 12: E600–e603.

17.

Masuyama

. Research to establish appropriate methods and implementation systems for management and provision of information by pharmacists and others in the sale of OTC drugs (in Japanese), https://mhlw-grants.niph.go.jp/project/155838 (2022, accessed 24 July 2024).

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB

0.01 MB