Abstract
Background
Digital medicine is an important tool in the current healthcare landscape. Fever is an important reason for evaluating patients at first and second levels of care and a frequent symptom of diseases subject to epidemiological surveillance.
Objective
To evaluate the diagnostic effectiveness of various algorithms in detecting communicable diseases of epidemiological interest in febrile patients at Hospital General Regional No. 1, Cd. Obregón, Sonora.
Methods
An observational, descriptive, and retrospective study was conducted in a second-level hospital from 1 January 2022 to 31 December 2023, to determine Cohen's kappa and the sensitivity, specificity, positive and negative predictive values, precision and Youden's J index of diagnostic algorithms for 20 communicable diseases with respect to the doctors’ diagnoses.
Results
Diagnostic algorithms were applied to the data of 909 cases. The sensitivities of Mediktor®, an artificial neural network-based algorithm, a medical diagnostic algorithm and a composite diagnostic algorithm were 11.97%, 64.09%, 69.92% and 99.37%, respectively, and the corresponding specificities were 93.43%, 91.24%, 27.01% and 5.11%, respectively. The neural network-based method yielded the highest Youden's J index.
Conclusions
The medical diagnostic algorithm had the best sensitivity, whereas the specificity was greater for the two artificial intelligence algorithms.
Introduction
In recent years, artificial intelligence (AI) seen rapid advancements in medical applications, and today, has had the greatest transformative effect among digital technologies worldwide, especially since the start of the coronavirus disease 2019 (COVID-19) pandemic.1,2 AI has great potential in the field of medicine, primarily in predictive analysis and support tasks for clinical decision-making. At present, AI is used to support clinical diagnoses, the early detection of neoplastic pathologies, treatment decision-making, robotic surgery and classification of medical emergencies.3,4
Increasing studies, literary reviews and accessible information are focusing on ‘machine learning’, a discipline within computing that drives the creation of AI through methods such as artificial neural networks (ANNs), learning algorithms encapsulated in computational models that are designed to mimic the function of neurons in the brain to follow a series of rules that allow the execution of an activity; that is, it fulfills a series of steps that do not cause doubts and have an endpoint.5–7
The World Health Organization recognized the potential of AI in the practicing of public health and medicine, as it can improve the ability of health care providers to improve patient care. 8
One of the longest lived medical applications based on ANNs, in use since its foundation in 2011, is Mediktor®, which consists of an AI algorithm in the form of a mobile application, can issue diagnoses ordered by probability. 9
A study conducted in 2017 published by the Spanish Society of Urgent and Emergency Medicine revealed 91.3% concordance between the diagnoses issued by the AI and the physicians responsible for the emergency service, which suggests that Mediktor is a reliable tool for helping to establish diagnoses in general. 10
The use of ANNs for the diagnosis of dengue has previously been documented in Mexico, with promising results from an epidemiological surveillance system based on operational definitions and laboratory confirmation. 11 In most infectious diseases, fever is considered a cardinal sign; that is, it is a frequent reason for consultations in first levels of care and a cause for hospitalization in second levels of care. 12 The presence of fever alone indicates a very large list of differential diagnoses. 13 Therefore, in the challenge of establishing a timely diagnosis of communicable diseases of epidemiological interest, the clinical criteria considered by the treating physician in the service will always be considered the first step, but the use of operational definitions, direct algorithms and algorithms based on ANNs can be potential tools for assisting in patient management. Therefore, the objective of this study was to evaluate the diagnostic effectiveness of various techniques (ANN-based algorithm, ‘Mediktor®’, direct medical diagnosis and composite algorithm) in detecting communicable diseases of epidemiological interest in patients with fever at Regional General Hospital No. 1, Cd. Obregón, Sonora. Ho T-S et al. used 90% sensitivity and 85% specificity, 14 our hypothesis was that the diagnostic algorithms applied to patients with fever can achieve a sensitivity equal to or greater than 85% and a specificity equal to or greater than 85%.
Methods
Methodological design
This was a single-center, observational, descriptive and retrospective study.
Definition of the population
The model used in this study was based on that published by Ho TS et al., 14 the construction of the algorithm is detailed in the supplementary material. The hospital medical records of patients evaluated via outpatient consultation, first contact, triage and emergency department visits and hospitalized with an infectious pathology of epidemiological interest at Hospital General Regional No. 1 Cd. Obregón Sonora from 1 January 2022 to 31 December 2023 was obtained through the unit's files, electronic medical records and the hospital platform of the digital health ecosystem. Patient records were included regardless of patient sex or age if the patient was evaluated in the emergency room and presented with fever among his or her clinical manifestations. Patients were excluded if they had a previous diagnosis of a chronic condition with episodes of fever or if their record was incomplete or missing.
Sample size
Nonprobabilistic sampling was carried out for consecutive cases from among 31,214 hospital admissions between 2022 and 2023 until the expected sample size (in terms of proportion) was reached. A previous pilot study that reviewed hospital admissions indicated that 24.7% of patients were admitted with or had a history of fever. Thus, with N = 31,214, Z 2 a = 1.96 (95% confidence interval), p = 0.247 and d = 3%, the sample size was calculated at 774 records, with an increase of 0.17 points.
Formula:
Specification of the variables of interest
Variables such as age, sex, education, occupation, clinical manifestations, clinical diagnosis and laboratory diagnosis were collected.
The dependent variable was the effectiveness of the ANN-based algorithm in diagnosing patients with fever, whereas the independent variables were the doctors’ diagnoses, the diagnoses of the Mediktor® application, the diagnoses of the medical algorithm and the diagnosis of the composite algorithm. The criteria for constructing the algorithms are summarized in the supplementary material (S1).
Algorithms trained
Mediktor® is an established app that boasts its effectiveness and is backed by work conducted at a Spanish hospital. For the calculation of the ANN, the javascript library was used: brain.js (https://unpkg.com/brain.js), which was reproduced in the Chrome search engine, Google, Inc. The direct algorithm was based on the operational case definition used in epidemiological surveillance, which aims to detect cases, so its sensitivity is typically higher. A combination of the direct algorithm with the ANN was used. The ANN was used to calculate the complement of the final diagnosis with the same JavaScript library. Finally, all the outputs will become 1 or 0, depending on whether the diagnosis was positive or not. In addition, the diagnosis percentage was established.
Statistical analysis
For descriptive statistics, qualitative variables are summarized with the absolute frequency and percentages; quantitative variables are expressed as measures of central tendency. The hypothesis was tested with the chi-squared test, and performance was assessed with Cohen's kappa, the sensitivity, specificity, positive predictive value, negative predictive value, precision and Youden's J index for the diagnostic algorithms, the diagnosis determined by the treating physician, the diagnosis established with confirmatory laboratory results and the diagnosis upon patient discharge. In all, confidence intervals at 95% were determinated. To the greatest extent possible, the records were matched at a 1:ratio according to the different diagnoses. Receiver operating characteristic (ROC) curve was calculated by disease, age and season of the year.
Ethical considerations
This study protocol complies with the ethical guidelines established in the Regulations of the General Health Law on health research, published in the Official Gazette of the Federation on 6 January 1986, with the latest reform in force on 2 April 2014; according to Article 17 thereof, this is considered a risk-free investigation. 15
Given that this was a retrospective study without direct intervention with the selected subjects or with the operation of the medical unit, all the patient data were handled confidentially using only the medical record as part of our institutional epidemiological surveillance, and ethics review and informed consent were not required by the institutional review board. The protocol was submitted for evaluation by the Local Health Research Committee No. 2603 of the Mexican Institute of Social Security, who granted approval with registration number R-2024-2603-038 (May 22, 2024).
Results
General description of the observations
A total of 909 records were included in the study; 55.9% (508) were from women, and 44.1% (401) were from men. The subjects had a minimum age of one year and a maximum age of 94 years, with a mean of 35.1 years, a median of 31 years and a mode of 14 years (SD = 24.2). Regarding the highest level of education completed according to the records, the largest proportion of patients had completed high school, at 29.4% (267), followed by middle school at 23.1% (210), undergraduate education at 23% (209) and elementary school at 17.4% (158), while 7.2% (65) reported not having received formal education. With respect to occupation, 30.9% (281) of the records were from students, followed by employees at 24.1% (219) and housewives at 15.2% (138).
Each diagnosis was represented by 50 records (5.5%) each, with the exception of endocarditis (n = 12; 1.3%) and hand-foot-mouth disease (n = 47; 5.2%). The diagnoses included the following: a) respiratory diseases: COVID-19, pharyngotonsillitis, influenza, pneumonia, acute otitis, acute suppurative otitis, rhinopharyngitis, rhinosinusitis and tuberculosis; b) vector-borne diseases: dengue (dengue with warning signs, severe dengue and nonsevere dengue) and rickettsiosis; c) digestive diseases: hepatitis A and d) other pathologies: hand-foot-mouth disease, meningitis, pyelonephritis, endocarditis and events supposedly attributed to vaccination or immunization (ESAVI).
Laboratory examinations were performed as a complementary diagnostic method, yielding a laboratory-confirmed diagnosis for 41.47% (377) of the cases; the samples of 14.9% (135) of the cases were discarded by the laboratory; and the remaining 43.7% (397) lacked laboratory studies showing evidence of the presence of an infectious etiology contributing to the diagnosis.
Evaluation of the diagnostic techniques
At a cutoff point of 0.85 (as established by the programmer), the Mediktor® software identified 76 positive cases and 256 negative cases of the 909 observations with respect to the doctors’ diagnoses, for an estimated sensitivity of 11.9%, a specificity of 93.4%, a positive predictive value of 80.8%, a negative predictive value of 31.4%, a precision of 0.36 and a Youden's J index of 0.05 (p < 0.05); thus, the null hypothesis was accepted, as the test did not reach the expected sensitivity and specificity values. Moreover, the Kappa index was 0.012, indicating slight agreement with the doctors’ diagnoses. The area under the ROC curve was 0.528 (95% CI 0.489–0.568) (Figure 1).

ROC curve of the Mediktor algorithm.
At a cutoff point of 0.85, the ANN-based algorithm identified 407 positive cases and 250 negative cases with respect to the doctors’ diagnoses, for an estimated sensitivity of 64.1%, a specificity of 91.2%, a positive predictive value of 94.4%, a negative predictive value of 52.3%, a precision of 0.72 and a Youden's J index of 0.5 (p < 0.001); thus, the null hypothesis was accepted, as the expected sensitivity and specificity were not reached. The Kappa index was 0.174, indicating slight agreement. The area under the ROC curve, 0.828 (95% CI 0.801–0.856), was greater than 0.5, demonstrating the ability of the ANN-based test to discriminate patients correctly and reflecting the marked difference from the Mediktor® algorithm (Figure 2).

ROC curve of the artificial neural network algorithm.
At a cutoff value of 0.85, the medical diagnoses based on the operational case definitions of the epidemiological surveillance systems standardized for Mexico identified 404 positive cases and 74 negative cases with respect to the doctors’ diagnoses, for an estimated sensitivity of 69.9%, a specificity of 27.1%, a positive predictive value of 68.9%, a negative predictive value of 27.9%, a precision of 0.56 and a Youden's J index of −0.03 (p > 0.05), which is reflected in the area under the ROC curve, with value of 0.537 (95% CI 0.496–0.579) (Figure 3).

ROC curve of the direct algorithm.
Finally, in the analysis of the composite algorithm, the estimated sensitivity was 99.4%, the specificity was 5.1%, the positive predictive value was 70.8%, the negative predictive value was 77.8%, the precision was 0.7 and the Youden's J index was 0.04 (p < 0.001). The kappa index was 0.03, indicating slight agreement. The areas under the ROC curve are summarized in Table 1.
Sensitivity and specificity of the different diagnostic algorithms.
PPV: positive predictive value. NPV: negative predictive value.
Notably, of the 20 diseases, the direct medical diagnosis, ANN-based algorithm and Mediktor® achieved 85% sensitivity for 13, 6 and 2 diseases, respectively. However, in terms of specificity, the situation was reversed; the direct medical diagnosis, ANN and Mediktor® achieved 85% specificity for 2, 14 and 13 diseases, respectively (Table 2).
Results of the algorithms by disease.
ESAVI: events supposedly attributed to vaccination or immunization; PPV: positive predictive value; NPV: negative predictive value.
Finally, the diagnoses of the different algorithms that yielded high values of sensitivity and specificity, as well as a Youden's J index close to 1, were analyzed individually. The results revealed that each of the algorithms could serve as support for the doctors’ diagnoses for certain diseases, but this evidence was limited for the relatively novel ANN-based algorithm (Table 2).
The ANN algorithm performed well in the identification of ESAVI, rickettsiosis, acute suppurative otitis media, rhinosinusitis, hand-foot-mouth disease, pyelonephritis, meningitis, pneumonia, tuberculosis and hepatitis A, with sensitivities and specificities greater than 75%. The results were different when the Mediktor® software was used for cases of acute otitis media, hand-foot-mouth disease and influenza, yielding values that did not meet the established cutoff point. However, the composite algorithm demonstrated greater sensitivity for most diseases but a lower specificity.
An analysis by age group of each algorithm was also performed (Table 3). Overall, the ANN performed best across all age groups. In age group 1 to 4 years, Mediktor® showed fails to identify true positives, so practically useless in this age group. Meanwhile ANN was with excellent performance. In the age 5 to 9 years and 10 to 19, the best algorithm was ANN, once again shows the best overall balance, the direct algorithm showed lots of false positives. However, in older adults, Mediktor® was specific but weak in sensitivity. In the analysis by season of the year, it was shown that ANN was the best, due to its strong balance between sensitivity and specificity and consistently accurate across all seasons (Table 4). The ROC curve by disease, age and season of the year is displayed in the supplementary material (S2) where ANN revealed the best results in most diseases.
Results of the algorithms by age.
PPV: positive predictive value; NPV: negative predictive value.
Results of the algorithms by season.
PPV: positive predictive value; NPV: negative predictive value.
Discussion
In this study, four diagnostic algorithms were evaluated with a total of 909 medical records to assess their efficacy in detecting communicable diseases in febrile patients in a hospital located in northern Mexico, bordering the USA. The results are important but also reveal limitations in the application of these algorithms in the clinician decision support.
The comparison among the diagnostic algorithms, the algorithms developed in the hospital and Mediktor® reveals a complex and multifaceted panorama in the application of AI in diagnostics. The analysis not only demonstrated the efficacy of the four compared methods but also highlighted significant limitations that affect their clinical applicability in real contexts. First, the AAN-based algorithm had a sensitivity of 64.09% and a specificity of 91.24%. These results indicate that although the algorithm did not achieve the desired sensitivity of more than 85.00%, it performed remarkably in correctly discriminating between patients with and without fever-based communicable diseases. In contrast, Mediktor® demonstrated a sensitivity of 11.90%, highlighting the potential of ANNs in the medical field. This finding aligns with those in the recent literature describing the use of AI in clinical diagnosis but also indicates the urgent need to improve the sensitivity and specificity of these models to ensure more accurate diagnoses. Notably, the pathologies in which the predictions of the ANN were best have dissimilar clinical manifestations and overwhelmingly affect a specific apparatus or system (for example, rickettsiosis and hand-foot-and-mouth disease presented with rash; gastrointestinal infections were observed in hepatitis A, rickettsiosis and pyelonephritis; urinary involvement was seen in pyelonephritis). Regarding dengue, the results were consistent with a previous study, where the direct algorithm had better sensitivity, with the particularity that in the present study it was for non-severe dengue and severe dengue. 11 The results state that the ANN-based method had the highest Youden's J index. However, with specificity of 27.01%, its clinical utility is questionable. In this sense, it should not be used as a confirmatory test, which is why the study seeks to gain an overview of the algorithms for choosing the best option, but always accompanied by medical judgment. Additionally, an investigation carried out when Mediktor® was first released 10 showed a 91.3% concordance with the clinical diagnosis as well as a sensitivity and specificity higher than 92.0%. These results indicated a greater alignment in the classification of diagnoses, which suggests that although Mediktor® has high reliability, it focuses on a different sample of pathologies that include medical and surgical cases. This difference in approach could influence the results, especially considering that the diseases studied were limited to those of epidemiological interest in the country. Also, the higher percentage of concordance in Mediktor® was probably because it was programed for a more common pathology in the location where the cases were collected; however, when applied to another population, the response will be different, due to the prevalence of the conditions.
Peculiarly, among the respiratory diseases diagnosed, Mediktor® achieved the highest sensitivity, specificity and PPV for pneumonia. One reason for this result is that the main manifestations considered when making the diagnosis were the presence of cough with purulent expectoration, which is not present in the criteria of the other diagnoses. In contrast, the performance of the direct clinical diagnosis algorithm administered by the health personnel for pneumonia was low, which highlights the need for the integration of more clinical or syndromic data.
In the analysis of the direct and composite algorithms, the results revealed notable differences. The composite algorithm achieved a sensitivity of 99.4%, indicating its ability to jointly identify a high proportion of positive cases. However, its ability to avoid false positives was limited. This situation raises serious concerns about the clinical applicability of the composite model, since a high number of false positives could lead to unnecessary treatments. A low specificity of 5.11% for the composite algorithm indicates massive overdiagnosis, so this may be due to the fact that the cases studied met operational definitions of conditions subject to epidemiological surveillance, which has led hospital unit physicians to diagnose diseases that are required to be reported. Also, the composite algorithm's extreme sensitivity and low specificity suggest poor real-world applicability. High sensitivity is useful for a screening test, which should be complemented by a confirmatory test, so it is recommended not to use it fully in AI, but rather as a method in the case of a total lack of information or as it has been commonly used in triage, which reflects the direct algorithm with better sensitivity. Despite its limitations, a positive aspect of the composite algorithm was the notably greater sensitivity that it exhibited compared with the other algorithms. It is necessary to consider, however, that the ANN is more efficient, as it had undergone greater training and includes more perceptron layers. Attempts have been made to combine various AI and machine learning techniques; however, because several variables are included, the results have been very heterogeneous. 16 ANNs have several advantages, including a high capacity to learn and generalize and the ability to handle imprecise, confusing, noisy and probabilistic information. 17 However, it is essential that clinicians complement automated diagnoses with critical analysis and a patient-centered approach. The interaction between health personnel and automated algorithms should be seen as a collaborative process. Human interaction in the diagnostic process is vital to maintaining empathy and understanding in health care and should not be replaced by automation.
The analysis of the results according to the individual pathologies reveals one of the strengths of the study: a high level of coincidence, sensitivity and specificity for 10 of the 20 probable diseases analyzed with the algorithm. This information is crucial, especially in the context of outbreaks and epidemics or in endemic regions where certain diseases are more prevalent. The effectiveness of the algorithms in detecting diseases such as tuberculosis and rickettsiosis is particularly important, considering the current situation in multiple regions of the country, such as northwestern Mexico, where these pathologies are endemic.
Most published studies of AI are based on a single disease; however, similar to the present study, Tran 18 described the use of machine algorithms in detecting patients with various communicable diseases, such as malaria, Lyme disease, hepatitis C, sepsis, tuberculosis and meningitis. Notably, a comparison of the performance of models for the latter two diseases limitations in the diagnosis of tuberculosis and differences in the variables considered by the algorithms for detecting patients with meningitis. For example, an algorithm including age, leukocyte count and serum glucose in the identification of meningitis patients yielded a sensitivity of 88.3%, 19 whereas another ANN model that included age, sex, leukocyte count, serum glucose, and the results of cytochemical cerebrospinal fluid analysis achieved a sensitivity of 98% and 100% specificity. 20 These results are in contrast to those of the present study for both diseases, mainly because of the variables included in the algorithm; the diagnosis of meningitis was based on meningeal and encephalic clinical manifestations and complemented with laboratory findings, which, in fact, would explain why the physician was better able to diagnose the disease than the AI model. For tuberculosis, the sensitivity of the ANN was 80.0%, and the specificity was 90.0%, while the medical diagnostic algorithm performed the worse. This is notable because the main criterion for diagnosing tuberculosis was cough for more than 15 days, which is also the main data of the standardized operational case definition in Mexico and because the area in which the study hospital is located is an area of Mycobacterium transmission. Unexpectedly, the medical diagnostic algorithm performed poorly for dengue with warning signs and pneumonia, with very low estimated sensitivity values. In this sense, it should be noted that for 13 diseases, the medical diagnostic algorithm presented a sensitivity of 85%; although the AI algorithms achieved this level of sensitivity for fewer diseases, they achieved higher specificity for dengue with warning signs and pneumonia, which explains why, since these algorithms were based on symptoms as major and minor criteria for decision-making, they achieved in greater sensitivity. In these pathologies where cough is a cardinal sign, we see that it is necessary to expand the semiology; there are some criteria to classify cough: the duration of the cough (acute, subacute, and chronic) the sudden expulsion of air (dry or wet); for example, in persons with pulmonary tuberculosis and chronic obstructive pulmonary disease, the majority experience a chronic, wet cough, while in patients with COVID-19, or with exacerbation of asthma, acute and dry cough predominates. In the present study, the variable time was included in tuberculosis, the type of cough in pneumonia and tuberculosis, and simply the presence of cough in COVID-19 and pharyngotonsillitis, which reflects a limitation of the study.
Models with comparatively high area under the ROC curve values (i.e. 0.90) may still be too weak for clinical application, since this value implies that, given a positive and a negative diagnosis, the negative diagnosis is ranked higher than the positive diagnosis 10% of the time. Until more accurate machine learning models can be developed, AI-based clinical diagnostics should play only a minor role in clinical settings. 21
The sensitivity, specificity and PPV for COVID-19 obtained in the present study reflect the complementing of the clinical diagnosis with the identification of actually ill patients, which contrasts with the findings of a study on an ANN model, which achieved 94.3% and 98.1% sensitivity and specificity, respectively 22 ; the differences could be explained by the fact that this study included the data from 4096 patients from 20 medical units, whereas the previous study only included the data of 50 COVID-19 patients from a single center.
In summary, operational definitions of cases subject to epidemiological surveillance were used. These definitions are designed to be more sensitive in order to detect the largest number of patients. Therefore, the medical algorithm tends to be more sensitive, while the neural network used did not give specific weight to signs and symptoms. Mediktor® uses an AI aimed at confirmatory clinical diagnosis, so its specificity is high but its sensitivity is not. However, this will also depend on the prevalence of the disease in the population where it is applied.
The variability in the effectiveness of the algorithms underscores the need to personalize diagnostic models according to the characteristics of the population and local disease prevalences. This variability also indicates that some algorithms may not be suitable for certain clinical contexts, which raises questions about the need to adjust and personalize the diagnostic algorithms according to the demographic and epidemiological characteristics of the population served. In this way, the title of the article by Abramova, 23 ‘Where the past helps the future’, suggests that the creation of algorithms necessary for training AI would lead to more accurate results but be incredibly challenging.
Another important item of note from this study is the importance of epidemiological surveillance. Fever, as a key indicator of infectious diseases, requires careful monitoring and the application of operational definitions that facilitate rapid and informed decision-making; thus, a robust surveillance system will benefit individual patient care and public health.
The strength of the present study is that four different methods were applied for the simultaneous diagnosis of various infectious diseases. Few studies have focused on the use of algorithms in communicable diseases, and often only one disease was investigated in each study. The differential bias due to the frequency of the pathologies was controlled with matching, and epidemiological factors typical of the disease in question were included in the criteria of the algorithms. Similarly, an analysis was conducted by age group and season to identify changes in these two situations.
The limitations of the study were as follows: 1) a large number of records were generated by a second-level doctor; 2) data from entire years were included without stratification by epidemic period; 3) no age-based analyses were conducted; 4) there was differential bias among the criteria for the combined algorithm, primarily laboratory and/or official diagnostic criteria, for some diseases; 5) study conducted in a medical unit, so it is still premature to apply it routinely and although formal random sampling was not performed, there was no convenience sampling; all patients who met the study criteria had an equal opportunity to enter the study and 6) our selection of 82% sensitivity and 85% specificity may not be suitable in all epidemiological settings; this selection of parameters was made because there were various pathologies to study. Therefore, the sensitivity, specificity and predictive value of our algorithms depend upon the distribution of other clinical diagnoses in the study population Finally, cases that met the operational case definition for conditions subject to epidemiological surveillance were used, which could have led to a bias for increased sensitivity. Other considerations are those specific to data collection, data may not always be readily available, particularly in low-resource settings. Additionally, data quality can vary, which may affect the accuracy of the model's diagnosis.
Infectious diseases remain a pressing global concern, often necessitating rapid and accurate detection to mitigate their impact. 24 Obtaining sufficient, high-quality data on emerging diseases, pathogens or strains is essential for training AI algorithms; machine learning models based on limited data can be biased, promoting inappropriate and erroneous diagnoses, with the possibility of greater health inequalities that would make clinical care difficult. 25 Future works should explore the need for region-specific training data sets to improve diagnostic accuracy. So, we must emphasize that laboratory-confirmation still remains the ultimate method of surveillance and outbreak investigation. AI and other utilities may be helpful when laboratories are overwhelmed. 14 Finally, some of these diseases are difficult to distinguish because they share common clinical and laboratory features. Failing to consider COVID-19 of influenza because of a positive dengue rapid test result has serious implications not only for the patient but also for public health. 26
One of the main future directions is on the continuous improvement of diagnostic accuracy by training the AI algorithms with reliable and robust data. As AI algorithms are refined and fed with a greater volume of clinical and epidemiological data, the ability to identify patterns and correlations is expected to become significantly more accurate; however, supervised models and scenarios in local clinical‒epidemiological contexts are needed. This will not only allow the models to diagnose diseases with greater accuracy but also help minimize the risks associated with misdiagnosis, a critical factor for patient safety.
Conclusions
The medical diagnostic algorithm was marginally better than the other algorithms in terms of sensitivity; however, the ANN-based algorithm effectively identified nondiseased patients, particularly at the individual disease level.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076251353292 - Supplemental material for Effectiveness of physician-based diagnosis versus diagnostic artificial intelligence algorithms in detecting communicable febrile diseases in Mexico
Supplemental material, sj-docx-1-dhj-10.1177_20552076251353292 for Effectiveness of physician-based diagnosis versus diagnostic artificial intelligence algorithms in detecting communicable febrile diseases in Mexico by Enrique Alonso Medina Fuentes, Carmen Alicia Ruíz Valdez, Porfirio Felipe Hernández Bautista Bautista, David Alejandro Cabrera Gaytán, Guadalupe Minerva Olivas Fabela, José Alberto Mireles Garza1, Olga María Alejo Martínez, Brenda Leticia Rocha Reyes, Alfonso Vallejos Parás, Lumumba Arriaga Nieto, Pérez Andrade, Leticia Jaimes Betancourt, Gabriel Valle Alvarado, Oscar Cruz Orozco and Mónica Grisel Rivera Mahey in DIGITAL HEALTH
Footnotes
ORCID iDs
Ethical considerations
The protocol was submitted for evaluation by the Local Health Research Committee No. 2603 of the Mexican Institute of Social Security, who granted approval with registration number R-2024-2603-038 (May 22, 2024).
Author contributions
EAMF did conceptualization and formal analysis. CARV involved in resources and project administration. DACG did writing the original draft and writing the review and editing. PFHB did formal analysis and validation. GMOF did methodology. JAMG involved in resources. OMAM did methodology and visualization. BLRR did supervision. AVP did investigation and validation. LAN did writing the review and editing. YPA did writing the original draft. LJB did investigation. GVA did writing the original draft. OCO did investigation. MGRM did writing the original draft.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Guarantor
DACG.
Supplementary material
S1. Algorithms and diagnostic methods compared in the study.
S2. Receiver Operating Characteristic curve.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
