Autonomous Artificial Intelligence in Diabetic Retinopathy: From Algorithm to Clinical Application

Abstract

Artificial intelligence (AI)-based algorithms are rapidly entering the health care field and have the potential to improve patient care. Our article focuses on the use of autonomous AI algorithms (ie, algorithms that can make clinical decisions without human oversight) in diagnostic imaging. In this article, we have used the example of diabetic retinopathy screening to highlight some important aspects to be considered by developers, policymakers, and end users when bringing autonomous AI algorithms into clinical practice. We have divided these aspects into (1) following the principles of safety, efficacy, and equity in all phases of development and implementation of the algorithm; (2) regulatory processes involving medical records, medical liability, and patient privacy; (3) cost and billing; and (4) the role of health care providers.

Keywords

artificial intelligence augmented intelligence clinical practice implementation regulation

Augmented or artificial intelligence (AI) algorithms are rapidly making inroads into health care. These can be assistive or autonomous. In assistive AI, a physician makes the final decision, whereas autonomous AI makes a clinical decision without physician involvement. Autonomous AI can be invaluable when specialists are not available. However, rigorous oversight and extensive validation is needed if autonomous AI is to be applied to clinical practice.

Many AI-based algorithms are being developed for the management of patients with diabetes mellitus (DM). While this article focuses on autonomous AI for diagnosis of diabetic retinopathy (DR), other applications in diabetes include artificial pancreas systems, fully automated closed loop, and hybrid closed loop insulin delivery systems automating insulin delivery based on continuous glucose monitoring data.¹ These systems make diabetes management easier for patients while improving glycemic control. Although the goal is to make this process fully automated, most systems adopt a hybrid approach where manual administration of insulin is required for meals. Some patients are also utilizing “do-it-yourself” algorithms where the patient is directly “in the loop” using an insulin pump, continuous glucose monitor, and open source algorithms to optimize glycemic control.^2-4

An important vision-threatening complication in patients with DM is the development of DR. Early detection and treatment of DR can prevent blindness. Yet, adherence to screening guidelines is dismal, as low as 15%.⁵ With advancements in deep learning algorithms, it is now possible for an automated algorithm to detect DR changes from fundus photos with the same accuracy as an eye doctor.⁶ Autonomous AI-based detection enables real-time, point-of-care screening for DR, potentially improving access to screening services and preventing blindness.⁷

In April 2018, the Food and Drug Administration (FDA) authorized the first autonomous AI system for early detection of DR. This allowed non-eye care professionals to detect referable DR, in primary care settings, without requiring an eye care provider to interpret retinal images, filling a huge unmet need for patients with DM. Many resources were invested in vetting the algorithm, conducting a pivotal clinical trial to establish safety and efficacy and the exact indications for which nonexperts in the real-world setting could use the algorithm.⁸ In this article, we focus on the aspects involved in bringing an autonomous AI algorithm into clinical practice. Although these principles will apply to other autonomous AI algorithms in the pipeline, we share our experiences and lessons learnt from introducing an autonomous AI algorithm for early detection of DR.

(1) Principles of Safety, Efficacy, and Equity (SEE) as described by the American Medical Association (AMA).⁹ Adherence to the SEE principles is best applied to all phases of autonomous AI: design, development, validation, deployment, and post-market monitoring. Safety (or do no harm): is measured using the sensitivity metric, that is, how many patients with disease are diagnosed correctly. Early use of computer-aided detection to identify suspicious findings in mammograms serves as a cautionary tale on the importance of rigorous assessment of accuracy of computational algorithms in real-world setting. Fenton et al¹⁰ compared the accuracy of screening mammography among centers with and without computer-aided detection and found that computer-aided detection was associated with significantly lower accuracy for the detection of malignancy. Often, autonomous AI will be rigorously validated for specific clinical indications and will not replace a general exam. For example, autonomous AI for DR does not replace an eye exam and will not identify other ocular problems, such as, glaucoma, macular degeneration, which may coexist with DR. Efficacy: is measured with the specificity metric, that is, how many patients without disease (“normals”) are correctly identified as “normal,” which is a measure of the efficiency gained. Equity: AI needs to be safe and efficient, with high sensitivity and specificity, irrespective of variables such as race, ethnicity, sex, or age. This can be measured using the “diagnosability” and “bias” metrics. Diagnosability is determined by how many patients get a valid diagnostic result as opposed to an uninterpretable result. The “bias” metric is measured by stratifying sensitivity and specificity by race, ethnicity, sex, and age so that any differences can be identified. Attention to equity will avoid biased results that may increase health care disparities.

In the pivotal trial for the IDx-DR retinal screening algorithm, the system was determined to have a sensitivity of 87.2% and specificity of 90.7%, in detecting more than mild DR among adults when compared with grading of wide-angle stereoscopic fundus photographs by the University of Wisconsin Fundus Photography Reading Center.⁸ In comparison, the sensitivity and specificity of a dilated ophthalmic exam by board certified ophthalmologists was 34% and 100%, respectively, when compared with a similar, outcome-based reference standard of stereoscopic mydriatic fundus photos graded by a reading center.¹¹ Thus the IDx-DR algorithm was more sensitive when compared with experts in identifying early DR. Importantly, there were no significant differences in sensitivity or specificity by sex, race, or ethnicity, suggesting that it met the “bias” metric and can be successfully used across populations irrespective of their gender or race/ethnicity.⁸ For diagnostic imaging, an important variable to assess is “diagnosability” or “imageability.” This is the proportion of patients who have an interpretable image using the AI algorithm when compared with the reference standard. In the case of IDx-DR, 96% of patients were able to get a disease classification via the AI algorithm when compared with the reference standard of the fundus photography reading center.⁸

(2) Regulatory processes: In order to obtain approval from FDA for the use of autonomous AI, a rigorous prospective, preregistered clinical trial in accordance with Good Clinical Practice as described by FDA should be conducted. Medical record: Output from autonomous AI is valid as a diagnostic report, but is not currently considered part of the medical record. The State Medical Boards decide what does and does not constitute a medical record and an autonomous AI output, at this moment, does not have equivalent medico-legal status as documentation by a physician and is currently not considered part of a medical record unless signed off by a physician. Liability: In most cases, autonomous AI will be used in the absence of specialists, for example, in case of DR screening, the primary care physician’s office will administer the test. However, this physician does not have specialized knowledge of the retina and should not be liable for an incorrect result. The creator of the AI or the company selling the AI should assume medical liability, as has been advocated by the AMA in its recent AI policy.⁹ Privacy: AI algorithms can accurately predict, age, gender, and smoking status, among other variables, from fundus photos alone and retinal images are considered to be protected health information by some. Securely holding large volumes of data will be vital as the use of AI in health care increases.

(3) Cost and Billing: Developing and rigorously testing an autonomous AI while adhering to the SEE principles mentioned above can be an expensive proposition. For example, in the case of IDx, it required raising $22 million, creating an ISO 13485 and 21 CFR 820 compliant organization, and 8 years of intense collaboration with FDA on rigorous validation of autonomous AI. To make this endeavor, which is likely to lead to long-term benefits for patients, cost effective, it is important to develop a mechanism for reimbursement. IDx received FDA authorization in April 2018. At that time there was no Current Procedural Terminology (CPT) code that allowed billing for autonomous AI, and a temporary code was disseminated by the American Academy of Ophthalmology.¹² In 2019, the CPT Editorial panel created for the first time a new CPT code for autonomous AI, retinal imaging with automated point-of-care.¹³ It is likely that the path forged by IDx, together with federal agencies, scientific and physician organizations, that led to the rigorous development, validation and payment of autonomous AI, has paved the way for others pursuing similar efforts.

(4) Role of Physicians: Physicians are at the frontlines of delivering care. They need to understand the limitations and applications of AI, be able to separate hype from reality where claims about AI are concerned and learn how to critically evaluate clinical trials testing AI (see Table 1 for proposed levels of reference standards). The need for such training was recently recognized by AMA.⁸

Table 1.

Levels of External Validity of the Reference Standard Used in AI-Based Clinical Studies.

Level	Definition
A	A reference standard that either is a clinical outcome or an outcome that has been validated to be equivalent to clinical outcome, that is, a surrogate for a specific clinical outcome. This reference standard is derived from an independent reading center, where the clinicians or experts performing the reading are not otherwise involved in performing the study, with validated published protocols, and with published reproducibility and repeatability metrics. Level A reference standard is based on at least as many modalities as the test and ideally more.
B	A reference standard derived from an independent reading center with validated published reading protocols, and with published reproducibility and repeatability metrics. Level B reference standard has not been validated to be equivalent to a clinical outcome.
C	A reference standard created by adjudicating or voting of multiple independent expert readers, documented to be masked, with published reproducibility and repeatability metrics. A level C reference standard has not been derived from an independent reading center and has not been validated to be equivalent to a clinical outcome.
D	All other reference standards, including single readers and nonexpert readers. A level D reference standard has not been derived from an independent reading center, has not been validated to be equivalent to a clinical outcome, readers may not be masked, and readers do not have published reproducibility and repeatability metrics.

AI, artificial or augmented intelligence; Reference standard or “truth” refers to the well-defined disease state or patient outcome that is used to train the algorithm. Source: Michael D Abramoff MD, PhD.

Potential Future Applications of AI in Diabetes

AI has the potential to help us with currently unmet challenges of managing patients with DM. One such challenge is the earlier identification of patients at risk for developing DM. Machine learning algorithms can leverage electronic health care data to learn the characteristics of patients with DM and potentially predict development of DM and its complications when patients are still in the subclinical stages.^14,15 However, it has been much harder to achieve high SEE for these systems, as their inputs consist of noisy, subjective text-based data, rather than the more objective sensor-based data that is the basis for image-based autonomous AI. Recent advances in deep learning techniques have made it possible to extract information, previously thought to be impossible to extract from photos, such as an individual’s age, gender, glycated hemoglobin, and blood pressure levels.¹⁶ This suggests that incorporation of retinal photos into prediction algorithms may further enhance our ability to diagnose diabetes in its earlier stages. However, before predictive algorithms are implemented in clinical practice they need to be carefully validated to ensure that they are safe, efficient, and equitable, which can only be done via preregistered clinical trials that include the complete workflow and human factors design, and compare AI algorithms with outcome data. A recent study reported that a prediction algorithm widely used to identify patients with medical needs exhibited racial bias by using health care cost as a proxy for severity of medical illness. Black patients incurred lower costs for the same level of disease severity as whites and were incorrectly identified by the algorithm as having fewer health care needs.¹⁷ The possibility of bias needs to be carefully addressed in the design, validation, and implementation of algorithms.

In summary, AI algorithms offer the potential to improve patient care. However, autonomous AI algorithms, in particular, need to be assessed for safety, efficacy, and freedom from bias in well-designed clinical studies before they can be implemented for patient care. Physicians, regulatory authorities, and policymakers need to ensure that algorithms are rigorously validated and operated in clinical care according to their indications for use, and further research is needed to identify the real-world barriers encountered in implementing AI algorithms.

Footnotes

Declaration of Conflicting Interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Michael D. Abramoff, MD, PhD, FARVO, investor, board member, employee at IDx. Patents and patent applications: University of Iowa and IDx.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Roomasa Channa

Michael D. Abramoff

References

Contreras

Vehi

Artificial intelligence for diabetes management and decision support: literature review. J Med Internet Res. 2018;20(5):e10775.

Boughton

Hovorka

Advances in artificial pancreas systems. Sci Transl Med. 2019;11(484):eaaw4949.

Braune

O’Donnell

Cleal

, et al. Real-world use of do-it-yourself artificial pancreas systems in children and adolescents with type 1 diabetes: online survey and analysis of self-reported clinical outcomes. JMIR mHealth uHealth. 2019;7(7):e14087.

Lee

Hirschfeld

Wedding

A patient-designed do-it-yourself mobile technology system for diabetes: promise and challenges for a new era in medicine. JAMA. 2016;315(14):1447-1448.

Benoit

Swenor

Geiss

LS.

Eye care utilization among insured people with diabetes in the U.S., 2010-2014. Diabetes Care. 2019;42(3):427-433.

Gulshan

Peng

Coram

, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402-2410.

Liew

Michaelides

Bunce

A comparison of the causes of blindness certifications in England and Wales in working age adults (16–64 years), 1999–2000 with 2009–2010. BMJ Open. 2014;4(2):e004015.

Abràmoff

Lavin

Birch

Shah

Folk

JC.

Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018;1(1):39.

AMA. Augmented Intelligence in Health Care H-480.939. 2019. https://policysearch.ama-assn.org/policyfinder/detail/augmented%20intelligence?uri=%252FAMADoc%252FHOD.xml-H-480.939.xml. Accessed August 29, 2019.

10.

Fenton

Taplin

Carney

, et al. Influence of computer-aided detection on performance of screening mammography. N Engl J Med. 2007;356(14):1399-1409.

11.

Lin

Blumenkranz

Brothers

Grosvenor

DM.

The sensitivity and specificity of single-field nonmydriatic monochromatic digital fundus photography with remote image interpretation for diabetic retinopathy screening: a comparison with ophthalmoscopy and standardized mydriatic color photography. Am J Ophthalmol. 2002;134(2):204-213.

12.

AAO. CPT code for New Technology IDx-DR. 2019. https://http://www.aao.org/practice-management/news-detail/cpt-code-new-technology-idx-dr. Accessed August 31, 2019.

13.

Shepard

Autonomous AI device receives historic CPT code. 2019. http://www.mddionline.com/autonomous-ai-device-receives-historic-cpt-code. Accessed August 7, 2019.

14.

Dagliati

Marini

Sacchi

, et al. Machine learning methods to predict diabetes complications. J Diabetes Sci Technol. 2018;12(2):295-302.

15.

Olivera

Roesler

Iochpe

, et al. Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes—ELSA-Brasil: accuracy study. Sao Paulo Med J. 2017;135(3):234-246.

16.

Poplin

Varadarajan

Blumer

, et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat Biomed Eng. 2018;2(3):158-164.

17.

Obermeyer

Powers

Vogeli

Mullainathan

Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453.