Abstract
Background:
Large language model (LLM)–based chatbots, such as ChatGPT and Gemini, have become widely used sources of medical information. No study has assessed the performance of LLM chatbots in providing clinically reliable information on high tibial osteotomy (HTO).
Purpose:
To evaluate the accuracy and relevance of different LLM chatbots in responding to frequently asked questions (FAQs) about HTO.
Study Design:
Cross-sectional study.
Methods:
A total of 35 FAQs about HTO were curated from online sources and categorized into 6 categories: general/procedure related, indications for surgery and outcomes, risks and complications of surgery, pain and postoperative recovery, specific activities after surgery, and alternatives to and variations of HTO. These questions were used as input to 5 different LLM chatbots: ChatGPT-3.5, ChatGPT-4, ChatGPT–4 Omni, Gemini Advanced and Gemini 1.5. Responses were collected from July 12 to 14, 2024 (ChatGPT-3.5, ChatGPT-4, ChatGPT–4 Omni, and Gemini Advanced) and on September 26, 2024 (Gemini 1.5). Two independent orthopaedic surgeons assessed the responses using a 5-point Likert scale (1 = very incorrect/very irrelevant, 5 = very accurate/very relevant). Responses were anonymized to blind evaluators to chatbot identities. Differences in accuracy among chatbots were assessed using analysis of variance, and differences in relevance using the Kruskal-Wallis test.
Results:
LLM chatbots demonstrated the following mean accuracy scores: GPT-3.5 (4.66 ± 0.64), GPT-4 (4.66 ± 0.54), GPT–4 Omni (4.94 ± 0.24), and Gemini 1.5 (4.86 ± 0.36), while Gemini Advanced showed significantly lower scores (3.83 ± 1.40) (P < .001) in answering HTO-related FAQs. Particularly, Gemini Advanced exhibited lower accuracy scores in the categories of indications and outcomes (P = .002) and alternatives and variations (P = .015). There were no significant differences among the models regarding general/procedure related (P = .12), risks and complications (P = .50), pain and postoperative recovery (P = .53), and specific activities after surgery (P = .09). All models provided relevant answers to all questions (35/35; 100%), except for Gemini Advanced (30/35; 85.7%).
Conclusion:
This study showed that ChatGPT-3.5, ChatGPT-4, ChatGPT–4 Omni, and Gemini 1.5 provided accurate and relevant responses on HTO, whereas Gemini Advanced exhibited limitations and underperformed in comparison with the other models.
In recent years, large language models (LLMs) have been increasingly explored for their potential to support health care services and improve patient experience. 33 Their applications span various medical fields, including patient education, imaging analysis, diagnostic support, drug discovery, communication between doctors and patients, medical content development, and enhancement of clinical workflows.2,8 At a rapid pace, various types of LLM services are now being developed and provided. 4 Among the most widely used LLM-based chatbots, ChatGPT (OpenAI) and Gemini (Google LLC; Alphabet Inc) have emerged as prominent tools in health care information delivery. 2 ChatGPT, introduced in 2022, has evolved through several versions, with GPT-4 and GPT–4 Omni offering enhanced reasoning and multimodal capabilities.9,13 Google's Gemini, initially released in early 2024, has also undergone rapid development, transitioning from the Gemini 1.0 Ultra–based Gemini Advanced to the more advanced Gemini 1.5.1,12 ChatGPT and Google Gemini are LLM-based chatbots trained on a data set covering various subjects, including medical publications. 7 When asked questions, they provide well-structured, interactive, and easy-to-understand answers. Furthermore, chatbots constantly learn from user interaction, adapting themselves to become more responsive to user needs. 29 In recent years, LLM chatbots have become an increasingly popular source of information in the medical field, with many patients turning to these tools for guidance on health-related questions. 10
Despite their potential, LLM chatbots have also generated controversy. These models produce responses by recognizing patterns and relationships in their training data rather than through actual understanding or logical reasoning. 25 The variable quality of online health information used to train these models, combined with the risk of misinformation, can result in incorrect or irrational answers.19,25 Therefore, the outputs of LLM chatbots should be carefully evaluated for accuracy and scientific validity. In response to these concerns, researchers have begun to assess their accuracy and consistency in medical contexts.21,22,28,34,36 Several studies show that LLMs such as ChatGPT and Google Gemini provide satisfying answers for certain subjects, but their reliability across all medical topics remains uncertain.15,23,28
Over the past few decades, high tibial osteotomy (HTO) has gained popularity for the treatment of medial compartment osteoarthritis of the knee, particularly in young and active patients with varus deformity.17,20 Recent studies reported that HTO using a locking plate led to satisfactory clinical outcomes, proper correction maintenance, and successful osteotomy-site healing in most patients.18,26
Its clinical importance continues to increase, particularly as the demand for postponement of the need for total knee arthroplasty (TKA) rises. 26 While several studies have evaluated the performance of LLM chatbots in answering questions related to common orthopaedic surgeries such as TKA and anterior cruciate ligament reconstruction,6,16,36 no previous study has examined their performance in the context of HTO. Therefore, we aimed to determine whether current LLMs can accurately provide information regarding this increasingly common but less frequently studied orthopaedic procedure. We hypothesized that ChatGPT (GPT-3.5, GPT-4, and GPT–4 Omni) and Google Gemini (Gemini Advanced and Gemini 1.5) would provide accurate and relevant responses to frequently asked questions (FAQs) about HTO, with no significant differences between models across evaluation criteria.
Methods
Question Curation
We asked 5 different chatbot models 35 common questions about HTO and evaluated their answers. Typing “high tibial osteotomy” into Google brought up the "people also ask" box, which generated a list of FAQs. Google “people also ask” features reflects frequently searched, real-world patient inquiries in a reproducible manner.11,30 The use of the “people also ask” function has been adopted in previous LLM evaluation studies for similar purposes.5,19,36 To ensure academic transparency and clinical relevance, all FAQs were reviewed and finalized by 2 board-certified orthopaedic surgeons with >10 years of clinical experience in knee surgery. Inclusion criteria were questions directly related to HTO that patients might commonly ask. Exclusion criteria included questions unrelated to HTO and duplicate or highly similar questions, to avoid redundancy. In this study, questions were categorized into 6 domains based on their primary intent, following frameworks used in previous chatbot evaluation studies. For example, questions about surgical definitions, procedures, and types were included in the “general and procedure related” category; questions about the appropriateness of HTO for specific patients and expected outcomes were categorized as “indications and outcomes”; questions about potential disadvantages, complications, and perioperative risks were classified under “risks and complications”; questions about healing time or recovery were categorized under “pain and postoperative recovery”; questions about postoperative timelines for activities such as walking were included in “specific activities after surgery”; and questions about other treatment options, different surgical techniques, and new technologies were categorized under “alternatives and variations” (Table 1).
Categories and Frequently Asked Questions
Response Generation by LLM Chatbots
The FAQs were input into both ChatGPT (GPT-3.5, GPT-4, and GPT–4 Omni) and Google Gemini (Gemini Advanced and Gemini 1.5) to obtain the results for comparison. The questions were asked and answered from July 12 to 14, 2024 (ChatGPT-3.5, ChatGPT-4, ChatGPT–4 Omni, and Gemini Advanced) and on September 26, 2024 (Gemini 1.5). Because artificial intelligence (AI)–based chatbots adapt to user feedback from previous responses and provide different answers that reflect the user when generating subsequent answers, only the initial responses were considered in the evaluation. After completing a response to a question, the “new chat” button was pressed to open a fresh chat window and submit a new question.
Expert Evaluation
The responses of the chatbot models—GPT-3.5, GPT-4, GPT–4 Omni, Gemini Advanced, and Gemini 1.5—were assessed for both accuracy and relevance. Factual accuracy was defined as how scientifically accurate the responses were and whether they reflected the most up-to-date information. Relevance indicated how useful and effective the responses were in addressing the questions. Each response was rated on a 5-point Likert scale. 31 Each evaluation scale definition was clearly listed at the top of the scoring sheet to ensure raters could refer to them during scoring (Tables 2 and 3). When the chatbot avoided answering, it was assigned a score of 1. Scoring was performed by the same 2 independent orthopaedic surgeons, possessing >10 years of clinical orthopaedic experience. Before scoring, they participated in a briefing session to review the evaluation scales and practiced with sample answers to ensure consistent understanding of the criteria.
Likert Scale for Grading Factual Accuracy
Likert Scale for Grading Relevance
To ensure the raters were unable to distinguish between the different chatbots, all generated responses were formatted into plain text. Identifying features including hyperlinks, images, or citation markers were removed or reformatted to prevent any model-specific clues. This ensured that the assessment focused solely on the accuracy and relevance of the textual content itself. The appropriateness of images or the correctness of cited references, which are model-specific presentation elements, were not included in the evaluation criteria. Furthermore, after anonymization, the plain text responses were randomly shuffled before being presented to the 2 independent orthopaedic surgeons for scoring.
Statistical Analysis
Each question was independently scored using a 5-point Likert scale, and scores were averaged. Ratings from the Likert scale were dichotomized. A score of ≥4 indicated an accurate or relevant response, while a score <4 was categorized as inaccurate or irrelevant. For accuracy, Likert scores were analyzed using 1-way analysis of variance with Tukey post hoc test to compare chatbot performance, as data followed a normal distribution. The percentage of relevant responses was calculated for each category. Because relevance data did not meet normality assumptions, the Kruskal-Wallis H test with Dunn post hoc test was used for group comparisons. Data analysis was conducted using R software Version 4.3.3 (R Foundation for Statistical Computing). A P value of <.05 was the threshold for statistical significance. To assess the consistency of scoring patterns between raters, the Pearson correlation coefficient and Cronbach alpha were calculated.
Results
Overall Performance
The mean accuracy of the ChatGPT models was 4.66 ± 0.64 for GPT-3.5, 4.66 ± 0.54 for GPT-4, and 4.94 ± 0.24 for GPT–4 Omni. For Google Gemini models, Gemini 1.5 had a mean accuracy of 4.86 ± 0.36, while Gemini Advanced had a significantly lower accuracy of 3.83 ± 1.40; P < .001) (Figure 1). The relevance analysis revealed a significant difference (P < .001) between Gemini Advanced (85.7%) and the other chatbots (100%). Gemini Advanced avoided answering 5 out of 35 questions (Figure 2). Consistency between raters was excellent, with a Pearson correlation coefficient of 0.93 and a Cronbach alpha of .96.

Bar chart with overall accuracy of chatbot responses to common high tibial osteotomy questions for the 5 different chatbots tested (GPT-3.5, GPT-4, GPT–4 Omni, Gemini Advanced, and Gemini 1.5). The y-axis represents the mean score over various categories of questions, while the x-axis shows the different chatbot models. Error bars indicate SD for each chatbot's score, illustrating the variability in the data. Asterisk indicates statistical significance of P < .001.

Evasive responses of Gemini Advanced. Gemini Advanced responded to certain questions by recommending consulting a specialist or by stating that the question was beyond its capabilities.
General/Procedure Related
This category included 5 FAQs. All models performed reasonably well, with no statistically significant differences (P = .12) (Table 4). GPT–4 Omni achieved the highest score of 5.00 ± 0.00, while Gemini Advanced had the lowest score of 3.40 ± 1.67. As shown in Table 5, the relevance rate was 80.00% for Gemini Advanced, whereas all other chatbots maintained a rate of 100%.
Results of Model Performance Analysis by Category for Accuracy a
Data are presented as mean ± SD.
Indicates statistically significant difference.
Results of Model Performance Analysis by Category for Relevance a
Data are presented as n of relevant responses (%).
Indications and Outcomes
This category included 7 FAQs. As shown in Table 4, GPT-3.5, GPT–4 Omni, and Gemini 1.5 achieved the highest accuracy scores (5.00 ± 0.00), followed by GPT-4 (4.86 ± 0.38). In contrast, Gemini Advanced recorded a significantly lower score (3.29 ± 1.89) (P = .002). The relevance rate was 71.4% for Gemini Advanced, whereas all other chatbots demonstrated full relevance across all questions (Table 5).
Risks and Complications
Four FAQs about the risks and complications of HTO surgery were included in this category. No significant differences were shown between the scores of all models, indicating similar performance (P = .50) (Table 4). Gemini 1.5 achieved the highest accuracy score at 5.00 ± 0.00. Both GPT-3.5 and GPT-4 recorded the lowest score of 4.25 ± 0.96.
Pain and Postoperative Recovery
Six FAQs were included in this category. All models provided accurate responses with scores >4, showing no significant performance differences among them (P = .53) (Table 4). Scores ranged from 4.33 ± 0.82 (Gemini Advanced) to 4.83 ± 0.41 (Gemini 1.5 and GPT–4 Omni).
Specific Activities After Surgery
Six FAQs were included in this category. All models achieved accuracy scores >4, with no significant differences in performance among them (P = .09) (Table 4). GPT–4 Omni recorded the highest score of 5.00 ± 0.00, while Gemini Advanced had the lowest score of 4.17 ± 0.75.
Alternatives and Variations
This category included 7 FAQs. The accuracy scores in this category differed significantly between chatbots, with GPT-3.5 and GPT–4 Omni scoring 5.00 ± 0.00, GPT-4 and Gemini 1.5 scoring 4.86 ± 0.38, and Gemini Advanced scoring 3.57 ± 1.81 (P = .015) (Table 4). This large difference was primarily driven by the markedly lower performance of Gemini Advanced compared with the other models. A relevance rate of 80.00% was achieved by Gemini Advanced, whereas all other chatbots achieved a rate of 100% (Table 5).
Discussion
The major findings of our study demonstrated that ChatGPT models (GPT-3.5, GPT-4, and GPT–4 Omni) and Gemini 1.5 provided highly accurate responses to HTO-related FAQs, with mean accuracy scores ranging from 4.66 to 4.94 and 100% relevance across all categories. In contrast, Gemini Advanced exhibited significantly lower accuracy (3.83 ± 1.40; P < .001) and reduced relevance (85.7%), with the greatest variability among its responses. These results indicate that the reliability of LLM chatbots is not uniform, underscoring the need for careful selection and validation before clinical application.
In terms of accuracy, category-specific analysis reveals that all chatbots provided accurate answers without significant differences in the categories of "general/procedure related," "risks and complications," "pain and postoperative recovery," and "specific activities after surgery". Gemini Advanced showed noticeable underperformance in the "indications and outcomes" and "alternatives and variations" categories (Table 4). In the "indications and outcomes" category, all models except for Gemini Advanced achieved highly accurate results when answering questions related to medical indications, suggesting that these models possess a strong understanding of when a particular treatment or procedure is appropriate. Similar to the results for the "indications and outcomes" category, most models demonstrate strong performance in the “alternatives and variations" category, suggesting a good understanding of the variations in treatment approaches or disease presentation. It is typically assumed that newer and more advanced chatbots will yield better performance.24,32 In this study, we found that the latest model of ChatGPT (4 Omni) consistently achieved the highest accuracy scores in most categories, highlighting its ability to handle general medical inquiries and provide accurate information related to recovery and activity recommendations. However, the differences in accuracy scores among the various ChatGPT versions tested were not statistically significant. Gemini 1.5 significantly outperformed Gemini Advanced, indicating an improvement in accuracy with the newer generation of this chatbot.
In terms of relevance, consistent with the findings on accuracy, only Gemini Advanced recorded less than perfect relevance (85.7%). Notably, Gemini Advanced showed low relevance in the categories of "indications and outcomes" and "alternatives and variations." In contrast, all other chatbots achieved 100% relevance across all responses. These results indicate that all chatbots, except for Gemini Advanced, were able to consistently demonstrate both accuracy and relevance in their responses.
The relatively low accuracy and relevance of Google Gemini Advanced were influenced both by its lower accuracy in provided answers compared with other chatbots and by its tendency to avoid answering certain questions, which were assigned the lowest score. Representative responses to the same question from Gemini Advanced (low rated) and GPT–4 Omni (high rated) is provided in Appendix Table A1. Regarding avoidance, Gemini Advanced either recommended consulting with a specialist or stated that the question was beyond its capabilities for 5 out of the total of 35 questions posed (Figure 2). The questions that Gemini avoided answering are characterized by their potential answers influencing decisions directly related to medical treatment. This may reflect a more conservative and constrained design philosophy for the Gemini Advanced model, in comparison with other models tested. While this approach seems like it should help minimize risks from misinformation, it may also lead users to look elsewhere for more direct guidance. In contrast, less restrictive models such as the various versions of ChatGPT provided direct answers, albeit with disclaimers encouraging users to consult an expert (eg, "These guidelines are general and may vary depending on individual circumstances. Always follow your surgeon's specific recommendations").
In addition to these findings, a direct comparison of Google's Gemini and ChatGPT revealed distinct approaches to information presentation. Specifically, Gemini enhanced the user experience by meticulously citing sources and incorporating relevant images, which improved both the reliability and the readability of the information provided (Appendix Table A2). This attention to detail could make Gemini a valuable tool for users who value transparency and concise information in their AI interactions. Conversely, we found that ChatGPT provided more detailed and lengthy answers. The latest version of ChatGPT exhibited a tendency to generate more voluminous output (Appendix Table A3). This voluminous output sometimes decreased the proportion of relevant information, potentially confusing patients. Also, other research has shown that LLM chatbots can sometimes give incorrect or misleading information, even creating believable but false statements. This issue has been called "artificial hallucination."3,14,35 However, in our study, no hallucinations were detected. While these findings highlight the reliability of chatbot-generated responses within the scope of HTO-related questions, it is difficult to conclude that this applies to all medical topics.
These findings suggest that chatbots could be integrated into clinical workflows to enhance patient care. For example, they could provide previsit education about HTO, support shared decisionmaking during consultations by explaining surgical options in simple language, and offer postoperative guidance to reinforce rehabilitation. Such integration may improve patient understanding, streamline clinic visits, reduce clinician workload, and enhance overall patient experience. However, ongoing advancements in AI models are needed to enhance their clinical relevance and ensure the safe and accurate delivery of medical information.
Limitations
There are some limitations to our study. First, chatbot models are constantly being updated, improved, and refined. In the case of Open AI, since the introduction of GPT-1 in June 2018, development has continued at a pace with the intervals between new models and updates becoming increasingly shorter. Google, on the other hand, first introduced its LLM in the form of Bard in 2023, followed by Gemini in February 2024. Since then, Google has implemented >20 updates with the frequency of their updates also increasing. As changes to these chatbots occur rapidly, continuous monitoring and evaluation are necessary. Therefore, our results should be interpreted with caution while we also acknowledge that the current performance of the various chatbots may be different from our results. Second, because this study only dealt with specialized medical questions, it is difficult to conclude that the same result would be obtained for general questions. We exclusively evaluated the various chatbots’ ability to provide high-quality answers to FAQs specifically related to HTO surgery. As such, our results do not provide a definitive answer on the overall performance of each model in addressing general questions. Third, this study exclusively evaluated chatbot responses in English. Because the performance of AI chatbots can significantly vary across different languages, 27 our findings might have limitations when being generalized to non–English speaking populations. Finally, this study assessed chatbot performance only based on expert evaluation, without evaluating patient comprehension or satisfaction. In addition, the accuracy assessments were conducted by only 2 orthopaedic surgeons, and intrarater reliability was not evaluated, which may limit the robustness of the findings. Future studies involving direct interactions between chatbots and patients are needed to evaluate the usability, reliability, and effectiveness of chatbots in real-world clinical settings.
Conclusion
This study evaluated the accuracy and relevance of AI-based chatbots in response to common questions related to HTO. GPT-3.5, GPT-4, GPT–4 Omni, and Gemini 1.5 successfully provided highly accurate and relevant responses. However, Gemini Advanced exhibited lower accuracy and relevance than the other chatbots, highlighting a clear performance gap. These findings suggest that highly proficient LLM chatbots hold substantial promise for enhancing patient education and clinical information delivery. To translate these findings into clinical workflows, future studies will need to evaluate patient comprehension and satisfaction with chatbot-provided information, as well as to examine their effect on clinical efficiency and shared decisionmaking.
Footnotes
Appendix
Examples of Questions With More Voluminous Output by ChatGPT–4 Omni Than Gemini 1.5
| Question: What is an HTO? | |
| Gemini 1.5 | An HTO is a surgical procedure that realigns the knee joint. It involves cutting and reshaping the tibia (shinbone) to shift the weightbearing load away from the damaged portion of the knee. This procedure is primarily used to treat unicompartmental osteoarthritis of the knee, which is arthritis that affects only 1 side of the knee, typically the inner (medial) side. By realigning the knee, an HTO can help to • Relieve pain • Improve joint stability • Slow the progression of arthritis • Delay or prevent the need for a total knee replacement HTO is typically considered for patients who are • Younger and more active • Have good range of motion in the knee • Have no significant ligamentous instability • Have arthritis primarily on 1 side of the knee It is important to note that HTO is not suitable for everyone, and your doctor will carefully evaluate your individual condition to determine if it is the right treatment option for you. |
| GPT–4 Omni | An HTO is a surgical procedure aimed at treating knee arthritis, specifically in the medial (inner) compartment of the knee. This procedure is commonly performed on younger, active patients who have osteoarthritis localized to 1 side of the knee and wish to delay total knee replacement. Here's a detailed overview: • Purpose: - Redistribution of load: HTO is used to realign the knee joint and redistribute weight away from the damaged, arthritic part of the knee to the healthier part. - Pain relief: by shifting the weightbearing axis, it helps reduce pain and slows the progression of arthritis. • Indications: - Medial knee osteoarthritis: typically for patients with medial compartment osteoarthritis. - Varus deformity: often performed on patients with a varus deformity (bow-leggedness) to correct the alignment. - Active lifestyle: suitable for younger, active patients who wish to maintain an active lifestyle and avoid or delay total knee replacement. • Procedure: 1. Incision and exposure: an incision is made on the upper part of the tibia (shinbone), just below the knee. 2. Bone cut: the surgeon makes a precise cut in the tibia. 3. Realignment: the cut bone is then carefully realigned to shift the weightbearing axis of the leg. This can be done by removing a wedge of bone (closing-wedge osteotomy) or adding a wedge (opening-wedge osteotomy). 4. Fixation: the realigned bone is stabilized using metal plates, screws, or other fixation devices to allow it to heal in the new position. • Recovery: - Rehabilitation: after surgery, a rehabilitation program involving physical therapy is crucial to regain strength and mobility. - Weightbearing: initially, weightbearing is limited, and patients may use crutches. Gradual return to full weightbearing is guided by the surgeon. - Healing time: complete recovery can take several months, with the bone typically healing in about 3 to 6 months. • Benefits: - Pain reduction: significant pain relief in the affected knee compartment. - Delay of knee replacement: can delay the need for total knee replacement by several years. • Risks and considerations: - Infection: as with any surgery, there is a risk of infection. - Nonunion or malunion: risks related to bone healing, such as the bone's not healing properly or in the wrong position. - Nerve or blood vessel injury: there is potential for damage to surrounding nerves or blood vessels. HTO is a valuable option for selected patients, allowing for improved function and pain relief while preserving the knee joint for as long as possible. |
| Question: What are the benefits of HTO? | |
|---|---|
| Gemini 1.5 | Here are the key benefits of HTO: • Pain relief: HTO effectively reduces pain caused by unicompartmental osteoarthritis (arthritis affecting only 1 side of the knee). By realigning the leg, pressure is shifted off the damaged portion of the knee, alleviating pain. • Improved function: with reduced pain, patients experience improved knee function and mobility. Activities such as walking, climbing stairs, and participating in certain sports become easier. • Delaying knee replacement: HTO can significantly postpone the need for total knee replacement surgery. This is particularly beneficial for younger, active patients who want to preserve their natural joint for as long as possible. • Faster recovery: compared with knee replacement, HTO typically involves a faster recovery time. Patients can often return to normal activities sooner. • Bone preservation: HTO preserves the natural knee joint and avoids the need for artificial implants. Additional considerations: • Ideal candidates: HTO is most successful in patients with - Unicompartmental osteoarthritis (typically on the inside of the knee) - A varus deformity (bow-leggedness) - Relatively healthy cartilage in the unaffected parts of the knee - A reasonable level of activity and fitness Not suitable for everyone: HTO may not be appropriate for patients with - Advanced, widespread arthritis - Inflammatory arthritis - Significant ligamentous damage - Obesity Overall, HTO offers a valuable treatment option for individuals with specific knee conditions. It can provide pain relief, improve function, and delay the need for more invasive procedures. |
| GPT–4 Omni | HTO is a surgical procedure commonly used to treat knee osteoarthritis, particularly in younger, active patients who have malalignment of the knee. The procedure involves cutting and reshaping the tibia (shinbone) to relieve pressure on the knee joint. Here are the primary benefits of HTO: • Pain relief: one of the most significant benefits of HTO is the reduction of pain associated with knee osteoarthritis. By redistributing the weight and relieving pressure on the damaged part of the knee, patients often experience substantial pain relief. • Delays the need for knee replacement: HTO can postpone the need for total knee replacement surgery, which is particularly beneficial for younger patients. This is important because knee replacements have a limited lifespan, and delaying this surgery can reduce the likelihood of needing a revision surgery later. • Improved knee function: patients often experience improved knee function and stability after HTO. This can lead to a more active lifestyle and the ability to participate in activities that were previously limited because of pain and instability. • Preservation of natural knee joint: unlike total knee replacement, HTO preserves the patient's natural knee joint. This can be advantageous in maintaining a more natural range of motion and knee biomechanics. • Correction of knee alignment: HTO corrects the alignment of the knee joint, which can improve the overall mechanics of the leg. This realignment can reduce stress on the knee and other joints, potentially preventing the progression of arthritis. • Improved quality of life: by reducing pain and improving knee function, patients often report an improved quality of life. They can engage in daily activities with greater ease and comfort. • Less invasive than total knee replacement: HTO is less invasive than a total knee replacement, which typically means a shorter recovery time and less risk of complications associated with more extensive surgery. • Potential for cartilage regeneration: in some cases, HTO can create an environment that encourages cartilage regeneration. By offloading the damaged part of the knee, the healthier cartilage can bear the load, potentially slowing the progression of osteoarthritis. HTO is a valuable option for patients with specific types of knee problems, particularly those who are younger, active, and have knee malalignment contributing to their arthritis. The decision to undergo HTO should be made in consultation with an orthopaedic surgeon who can assess the individual patient's condition and determine the most appropriate treatment plan. |
HTO, high tibial osteotomy.
Final revision submitted July 3, 2025; accepted August 11, 2025.
One or more of the authors has declared the following potential conflict of interest or source of funding: This research was supported by Hallym University Research Fund 2024 (HURF-2024-46). AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.
Ethical approval was not required for this study, as it involved analysis of publicly available chatbot outputs and did not include any human subjects.
