Abstract
Background
Billing and coding for orthopaedic procedures is a complex process with thousands of procedure codes and associated modifiers in existence. Foot and ankle faces an additional challenge as it is among the highest variability regarding procedures performed compared with other orthopaedic subspecialities. This study aimed to investigate the capabilities of the top AI search engines in accurately identifying Current Procedural Terminology (CPT) codes for common foot and ankle procedures.
Methods
A comparative analysis of 3 publically available AI search engines (ChatGPT, Bing, and Google Gemini) was performed investigating their accuracy in generating CPT codes for common orthopaedic foot and ankle procedures. The generated CPT codes were recorded and compared with the codes generated by 3 fellowship trained foot and ankle surgeons, serving as the reference standard. Cohen kappa coefficient was used to determine agreement across AI platforms regarding the surgeon coding reference standard.
Results
The AI search engines were able to correctly generate the appropriate CPT codes 44% of the time, with Bing being the most accurate in generating the correct CPT codes for 8 of the 13 procedures (62%) and partially correct codes 3 of the 13 procedures (23%). ChatGPT demonstrated the worst accuracy, generating the correct CPT codes only 23% of the time (3/13). AI platforms demonstrated an overall Fair Agreement with the reference standard (kappa = 0.201). Individually, Bing demonstrated Moderate Agreement (kappa = 0.405), Google Gemini demonstrated Fair Agreement (kappa = 0.255), and ChatGPT demonstrated Poor Agreement with the reference standard (kappa = 0.171).
Conclusion
Although the capabilities of AI show great promise for many industries, the results of this study bring caution to relying on AI for accurately generating orthopaedic foot and ankle procedure CPT codes.
Level of Evidence:
III, Comparative Study
Get full access to this article
View all access options for this article.
