Sage Journals: Discover world-class research

Abstract

French

Purpose

To develop and assess the performance of a machine learning model which screens chest radiographs for 14 labels, and to determine whether fine-tuning the model on local data improves its performance. Generalizability at different institutions has been an obstacle to machine learning model implementation. We hypothesized that the performance of a model trained on an open-source dataset will improve at our local institution after being fine-tuned on local data.

Methods

In this retrospective, institutional review board approved study, an ensemble of neural networks was trained on open-source datasets of chest radiographs for the detection of 14 labels. This model was then fine-tuned using 4510 local radiograph studies, using radiologists’ reports as the gold standard to evaluate model performance. Both the open-source and fine-tuned models’ accuracy were tested on 802 local radiographs. Receiver-operator characteristic curves were calculated, and statistical analysis was completed using DeLong’s method and Wilcoxon signed-rank test.

Results

The fine-tuned model identified 12 of 14 pathology labels with area under the curves greater than .75. After fine-tuning with local data, the model performed statistically significantly better overall, and specifically in detecting six pathology labels (P < .01).

Conclusions

A machine learning model able to accurately detect 14 labels simultaneously on chest radiographs was developed using open-source data, and its performance was improved after fine-tuning on local site data. This simple method of fine-tuning existing models on local data could improve the generalizability of existing models across different institutions to further improve their local performance.

Graphical Abstract

Keywords

artificial intelligence machine learning chest radiographs fine-tuning

Introduction

Chest radiographs are commonly used in emergency departments worldwide to help diagnose common and potentially life-threatening conditions.¹ As imaging of patients in emergency departments has become increasingly common, the workload of emergency radiologists has increased exponentially.² Prompt communication of findings on radiographs is critical for reducing the risk of adverse clinical outcomes. Hence, radiographs with critical or time-sensitive findings must be read promptly to provide optimal patient care. Machine learning (ML) models can assist radiologists in detecting and reporting time-sensitive pathology by prioritizing life-threatening conditions.²

ML models have been shown to be very useful in medical imaging analysis.^2,3 While many previously developed models achieve high accuracy, relatively few of these have been implemented into clinical workflows.⁴ Some reasons for the lack of implementation include limited clinical usefulness of models designed to detect only a single or a handful of pathologies, and a decrease in accuracy when a model is used across different institutions (lack of generalizability).^1,5-10 This may be due to differences in hardware, resolution, artifacts, and patient demographics.⁸ A very large dataset which includes images from various institutions and geographic regions is required to create a model with good generalizability.¹¹ However, such large datasets are difficult to acquire in practice.

Fine-tuning a model on a small set of local data helps adjust the model to local reporting practices, patient demographics, and image quality.⁶ Acquiring the hundreds of thousands of studies required for training a neural network with data from a single institution is prohibitively time-consuming; therefore, the use of a ML model trained on open-source data and fine-tuned with local data is promising for increasing the adoption of ML models.^6,10-12

The purpose of this study was to develop and assess the diagnostic performance of a ML model which screens chest radiographs for 14 labels simultaneously, and to determine whether fine-tuning the model on local data improves its performance. We hypothesized that fine-tuning a neural network by training it with local radiographs will improve its diagnostic accuracy at the same local institution.

Methods

Study Design

This retrospective study was approved by the local institutional review board, with a waiver for consent. All patient data was downloaded, anonymized, and encrypted using institution-approved software. This study involved model creation as well as feasibility testing of our fine-tuning method.

Architecture

The ML model was made up of two large neural networks, one for use with single view studies (SoloNet), either anteroposterior (AP) or posteroanterior (PA) views, and another model for use with studies including both a lateral and either an AP or PA view (DuoNet), shown previously to perform better than single view networks.¹³ All networks were implemented using PyTorch 1.4.0 in Python 3.6.¹⁴

SoloNet is an ensemble network that averages the outputs of three convolutional neural networks (CNNs), DenseNet-121, ResNext50, and MobileNetV2 for multi-class classification.^15-17 Each network was initially designed to accept 3-channel RGB images and classify 1000 classes. For use with grayscale chest radiograph images, the initial layer of each network was modified to accept 1-channel inputs. The final fully connected layer was replaced in each network to output 14 classes instead of 1000.

DuoNet is an ensemble network similar to SoloNet but differs by having two identical network backbones which each accept either one of PA or AP, and one lateral view image.¹³ As in SoloNet, the input layer of each backbone was modified to accept 1-channel grayscale image inputs. The two backbones were combined by removing the final layer from each backbone and introducing a new layer that received the extracted features from each backbone as inputs, and output 14 classes.

Open-Source Training

Training was performed on Amazon Web Services p3.16xlarge computing instances with 8 GPUs, 488 GiB of memory, and 128 GiB of GPU memory. SoloNet and DuoNet were first trained using data from two large open-source datasets, MIMIC-CXR and CheXpert.^18,19 Both datasets consisted of chest radiographs with labels extracted from their accompanying reports. We used 14 label-extracted observations (no findings, widened mediastinum, cardiomegaly, lung lesion, lung opacity, oedema, consolidation, pneumonia, atelectasis, pneumothorax, pleural effusion, pleural other, fracture, support devices), which were marked as either positive, negative, uncertain, or unmentioned in the reports (Table 1). Altogether, the open-source dataset consisted of 136,676 cases, with a train/validation/test split of 80/10/10. Network weights were initialized from ImageNet-trained models.²⁰ Stochastic gradient descent optimizer with learning rate of .01 for DenseNet-121 and ResNext50, and .05 for MobileNetV2, minibatches of 88 images, and dropout of .7 were used during training. Binary cross entropy loss was used. Training was stopped after 10 epochs without improvement of validation loss. The resulting machine learning model is referred hereafter as the “open-source model” (Figure 1).

Table 1.

Open-source dataset.

Label	Positive	Negative
No finding	56,828	79,991
Enlarged cardiomediastinum	23,060	113,525
Cardiomegaly	17,733	118,789
Lung lesion	6220	130,501
Lung opacity	56,245	80,522
Oedema	12,298	123,908
Consolidation	23,998	112,497
Pneumonia	16,955	118,691
Atelectasis	22,367	113,769
Pneumothorax	4711	132,026
Pleural effusion	26,734	109,668
Pleural other	3190	133,486
Fracture	4196	132,556
Support devices	16,069	120,731

Note: Uncertain labels were assigned positive or negative based on uniform distribution.

Figure 1.

Summary of local dataset acquisition. Desired study descriptors included: atelectasis, cardiomegaly, oedema, mass, nodule, opacity, normal, pleural effusion, pleural thickening, pneumonia, and pneumothorax.

Local Dataset

Local chest radiographs were acquired from our institution’s archives to further “fine-tune” and subsequently evaluate the models. This local dataset was acquired from our institution’s archives which included studies done for outpatients, inpatient, and emergency room patients. An institution-approved software was used to pull chest radiographs studies with dates ranging from 2012–2019 by querying for the following study descriptors: atelectasis, cardiomegaly, oedema, mass, nodule, opacity, normal, pleural effusion, pleural thickening, pneumonia, and pneumothorax. A total of 13,835 radiographs were pulled, along with their finalized radiologist reports which were each reported by a board-certified radiologist from our site. Radiologist reports were label-extracted for our 14 labels using the CheXpert labeller, a Natural Language Processing (NLP) tool.¹⁹ Subsequently, 8523 chest radiograph studies were excluded because they either did not include two views (lateral and AP or PA) or did not include any of the 14 desired labels. The resulting dataset therefore included 5312 local chest radiographs (Figure 1). For each, the 14 labels were marked as positive, negative, uncertain, or unmentioned (Table 2). The radiograph studies included had median patient age of 62 (range 3-107), and patient sex 50% male, 49% female, and 1% unknown. The local dataset was divided into train/validation/test sets using proportions of 70/15/15.

Table 2.

Local dataset.

Label	Positive	Negative	Uncertain
No finding	145	0	0
Enlarged cardiomediastinum	724	745	1259
Cardiomegaly	563	111	89
Lung lesion	588	45	66
Lung opacity	1335	53	53
Oedema	1066	495	322
Consolidation	1505	555	1124
Pneumonia	654	230	381
Atelectasis	1935	230	381
Pneumothorax	826	811	76
Pleural effusion	2483	757	337
Pleural other	250	13	122
Fracture	306	42	14
Support devices	2598	202	15

Note: Uncertain labels were subsequently assigned positive or negative based on uniform distribution.

Local Fine-Tuning

The local training and validation sets were used to further train, or “fine-tune” the individual networks. Stochastic gradient descent optimizer with learning rate of .001, minibatches of 88 images, and dropout of .8 were used during training. Training was stopped after 10 epochs without improvement in binary cross entropy loss on the validation data. The resulting machine learning model is referred hereafter as the “fine-tuned model” (Figure 2).

Figure 2.

Summary of the training and fine-tuning process.

Class Imbalance

Due to class imbalance within the dataset, positive weights were calculated as the ratio of negative case counts to positive case counts. Iterative stratification was applied, which ensures that the proportion of each label present in each dataset (training, validation, and testing) is approximately equal.

Image Processing

High resolution grayscale images were first cropped of black borders, then resized to 512 × 512 pixels. Images were then normalized using the ImageNet normalization parameters, adjusted for grayscale images.²⁰

Label Processing and Augmentation

Unmentioned labels were treated as negative labels. Label smoothing was applied to all uncertain labels.²¹ Briefly, uncertain labels were assigned a soft label randomly selected from a uniform distribution between .55 and .85.²¹ Labels were corrected to account for lung disease hierarchical dependencies as outlined by Irvin et al.¹⁹ For example, oedema, consolidation, pneumonia, lung lesion, and atelectasis labels required a label of lung opacity. Pneumonia required a label of consolidation, and cardiomegaly required a label of enlarged cardiomediastinum.¹⁹

Data Augmentation

For improving generalizability and orientation invariance, horizontal flips, and rotations between −10 and 10 degrees were randomly applied with a 50% probability to input images during training.

Visualization

GradCAM was used to visualize class activation mappings and to localize important regions within each chest radiograph, contributing to the predicted class labels.²²

Evaluation

The open-source and locally fine-tuned models were evaluated on a test set consisting of 802 locally acquired radiograph studies, dating from 2012-2019. Radiologist reports were label-extracted using the CheXpert labeller.¹⁹ Receiver-operator characteristic curves and area under the curve (AUC) were calculated for each model’s detection of each label. The overall AUC of each model was also calculated, representing the mean AUC of all 14 labels. DeLong’s method was used as the statistical test for comparison of each model’s AUC for detection of individual labels.²³ Statistical significance of overall AUC of the two models was calculated using a Wilcoxon signed-rank test, as outlined by Demsar.²⁴ Statistical significance was defined as P < .01. SciPy 1.4.1 was used in Python 3.6 for statistical analyses.²⁵ Sensitivity and specificity were calculated using individual threshold values from the AUCs for each model’s detection of each label. The threshold was chosen such that the sum of sensitivity and specificity was the highest possible.

Results

Model Training

Open-source training runtime ranged from 6 hours to 12 hours. Fine-tuning runtime ranged from 1 hour to 4 hours (Supplemental Appendix 1,2).

Model Testing

The open-source and fine-tuned models were both tested on 802 local radiographs, and were both successful in detecting the 14 labels, albeit with variable accuracy. Receiver operating curves for the detection of the 14 labels by the fine-tuned model were calculated (Supplemental Appendix 3). AUCs were calculated for the open-source model’s and the fine-tuned model’s detection of 14 labels, with the local radiologist report used as the gold standard to evaluate the diagnostic performance. The overall AUC for each model, representing the mean AUC of all 14 labels, was also calculated (Table 3). The AUC improved after fine-tuning on local data for 11 of the 14 labels. The AUC was essentially unchanged for the remaining 3 labels after fine-tuning on local data (the largest drop in AUC was .005 for pleural effusion). The AUC improvements were statistically significant for 6 of the 14 labels (P < .01). Of note, the overall AUC was significantly improved, from .798 to .815 after fine-tuning (P < .01).

Table 3.

Comparison of area under the receiver operating curve of open-source and fine-tuned models for 14 labels.

	Open-Source Model	Fine-Tuned Model	P-Value
Atelectasis	.798 [.766, .829]	.801 [.770, .831]	.565
Cardiomegaly	.821 [.780, .860]	.863 [.830, .896]	<.001
Consolidation	.642 [.604, .680]	.675 [.639, .712]	<.001
Oedema	.851 [.822, .879]	.853 [.825, .880]	.677
Enlarged cardiomediastinum	.801 [.759, .842]	.835 [.798, .872]	.001
Fracture	.665 [.573, .757]	.698 [.606, .790]	.161
Lung lesion	.772 [.722, .821]	.817 [.776, .858]	.008
Lung opacity	.783 [.741, .825]	.782 [.739, .826]	.916
No finding	.898 [.850, .946]	.897 [.842, .950]	.917
Pleural effusion	.867 [.842, .892]	.862 [.836, .888]	.154
Pleural other	.744 [.673, .815]	.812 [.758, .865]	.003
Pneumonia	.744 [.691, .795]	.764 [.714, .814]	.060
Pneumothorax	.819 [.777, .862]	.869 [.834, .904]	<.001
Support devices	.883 [.860, .906]	.887 [.863, .910]	.500
Overall	.798	.815	.006

Note: Parentheses indicate 95% confidence intervals. Bold P-values represent statistical significance (P < .01).

Sensitivity and specificity for the detection of each label was calculated for both the open-source and locally fine-tuned models (Table 4). The highest possible sum of sensitivity and specificity was greater in the fine-tuned model for 11 of the 14 labels (Table 4).

Table 4.

Sensitivity and specificity of open-source and fine-tuned models for 14 labels.

	Open-Source Specificity	Open-Source Sensitivity	Fine-Tuned Specificity	Fine-Tuned Sensitivity
Atelectasis	.66	.82	.77	.71
Cardiomegaly	.78	.69	.66	.89
Consolidation	.44	.81	.47	.84
Oedema	.71	.85	.68	.87
Enlarged cardiomediastinum	.75	.69	.79	.71
Fracture	.67	.61	.83	.55
Lung lesion	.68	.72	.69	.83
Lung opacity	.79	.64	.74	.71
No finding	.84	.84	.75	.94
Pleural effusion	.77	.84	.85	.75
Pleural other	.66	.73	.66	.83
Pneumonia	.84	.55	.77	.66
Pneumothorax	.75	.78	.83	.78
Support devices	.78	.87	.82	.86

Note: Bolded values represent model with greater sum of specificity and sensitivity.

GradCAM was used to localize important regions within each chest radiograph in the testing dataset. An example of GradCAM results is shown (Figure 3).

Figure 3.

Example of GradCAM results for one tested radiograph, highlighting the areas with abnormalities detected; a) Original chest radiograph; b) identification of pneumothorax; c) identification of pleural effusion; d) identification of atelectasis; and e) identification of support device (chest tube).

Discussion

Performance of ML models can decrease when used at institutions different than the ones where they were developed.^5,7,8 We demonstrated that fine-tuning using a small local dataset may be a solution for adapting models across different institutions. Fine-tuning may improve performance by biasing the model towards our local institution’s image specifications, techniques, protocols, and equipment as well as biasing towards local radiologist reporting practices.⁶ While this bias is usually undesirable, as it decreases a model’s generalizability, it is useful for training a model for use at one specific institution. This approach could be used to adapt available models for use in institutions that do not currently develop their own models.

Currently, there are few Health Canada approved ML models for chest radiograph analysis. ClearRead Xray has two algorithms available, one for lung nodule detection with an AUC of .558, and one that highlights tubes and catheters for reduction of study interpretation time.²⁶ One algorithm, xrAI, detects pulmonary abnormalities with heatmaps, but there are limited publicly available results other than 20% improvement in diagnosis.²⁶ There are more algorithms with Federal Drug Agency approval in the United States, including algorithms for detection of pneumothorax exclusively, pleural effusion exclusively, and 10 abnormalities on chest radiographs.²⁶ While these models perform with high accuracy, many have limited utility as they only detect a small number of chest pathologies. Additionally, these can be quite costly to an institution. Therefore, using an open-source model fine-tuned to a specific institution’s needs provides a more cost-effective option.

Previously in the literature, transfer learning has been used to adapt a machine learning model previously trained on natural images (non-radiological images) for applications in radiology by re-training the model with a relatively small number of radiological images.^11,27,28 There exist very few examples of a fine-tuning method like ours which fine-tunes a model originally trained with medical images with local data, with the goal of improving its performance at a specific institution. One example by Rauschecker et al. used a similar fine-tuning method to optimize a brain MRI lesion segmentation algorithm trained at one institution by fine-tuning the model with data from 51 patients from a second institution.²⁹ Another example by Kitamura and Deible used a model trained to detect multiple pathologies on chest radiographs and retrained it specifically for the detection of pneumothorax at their local institution.³⁰ Similarly to ours, both of these studies found that training with a small data subset from the second institution increased performance.^29,30 Therefore, our study adds to the small body of literature showing that local fine-tuning is an effective method for improving an open-source model for use at a specific institution.

Accuracy of the fine-tuned model is likely affected by the size and quality of the local dataset. The AUCs for detection of 14 different labels varied between .642 and .898 for the open-source model and between .675 and .897 for the fine-tuned model. The wide ranges in our AUC values for each label show that the models have higher diagnostic accuracy in detecting certain labels compared to others. For example, they are less accurate at identifying fractures than pleural effusions, perhaps because chest radiographs are not optimized for fracture detection and old fractures are commonly not mentioned in the reports used to train the model. There were also a low number of cases positive for fractures in our datasets. For example, only 3.1% of the open-source dataset and 5.8% of the local dataset had positive labels for fracture (Tables 1 and 2). To further improve the diagnostic accuracy of the fine-tuned model, one could continue the fine-tuning process with a larger dataset of local radiographs. We were able to achieve a statistically significant improvement for multiple pathologies with a modest dataset of only 4510 local studies. By increasing the size of the local dataset, one could potentially further improve performance.

Using multiple radiologists to label the local dataset rather than labels extracted from existing reports by the CheXpert labeller could offer another avenue for improving diagnostic accuracy. The CheXpert labeller is limited by only using labels mentioned in the radiologist report. For example, a report may read only “no interval change” for a chest radiograph showing a pleural effusion unchanged from prior studies, even though that pathology is present on the image. Automation of labelling saves time, but accuracy of the model may be improved if each radiograph in the local dataset is read and labelled by a radiologist for the purpose of the study.

Our next step is to integrate this model into the emergency radiology workflow to triage incoming radiographs, identifying those for immediate interpretation by the radiologist. We plan to undertake a study to determine the clinical impact of this model by determining whether using it for triage in our emergency radiology department decreases time to interpretation of radiographs with urgent findings. Currently, radiographs are read chronologically.

In our fine-tuned model, diagnostic accuracy of identifying a chest radiograph without pertinent findings was high with an AUC of .897 (Table 3) and sensitivity of .94 (Table 4). High sensitivity for detecting radiographs with no findings is important in the context of screening for acute findings and prioritizing worklists.³¹ Although higher individual AUC values have been reported in detection of the some of these chest pathologies in the literature, these models usually only detect a few pathologies.¹ Lower AUCs were deemed an acceptable trade-off for our model’s ability to detect a wide range of pathologies for triaging purposes. In addition, the threshold used to calculate sensitivity and specificity values from the ROC was selected to maximize the sum of sensitivity and specificity values (Table 4). However, when applied clinically, a threshold value representing different points on the ROC could be used to favour different sensitivity and specificity values, depending on the clinical application. For example, for the purpose of prioritizing radiographs with urgent findings it is important to minimize false negatives; therefore, lower specificity may be accepted to prioritize high sensitivity values.

One main limitation of our study is that we have only shown improvements by fine-tuning of one model at one institution. It is unclear whether this method would be effective for multiple other ML models and at multiple other institutions. Another limitation is the use of the NLP labeller as described above. As our objective was to test whether our relatively simple and time-effective process of fine-tuning was a viable method for an individual institution to improve an open-source model’s accuracy for local use, we used an NLP labeller because it is less time and resource-intensive than labelling by expert radiologists. However, for optimal results, the local datasets would ideally be labelled by at least two expert radiologists, eliminating the potential errors introduced by using NLP labellers. In the future, the accuracy of NLP labellers may improve and eliminate the bottleneck of manual labelling. Finally, while our fine-tuned model is designed specifically for use at our institution, other institutions could use our fine-tuning method to adapt an open-source model for their clinical workflow if a small set of local radiograph studies and existing radiologist reports are accessible.

In conclusion, we have successfully fine-tuned a model originally trained on open-source data with a relatively small amount of local data, to accurately detect 14 labels on chest radiographs at our local institution. We have shown that our fine-tuning process significantly improved the overall diagnostic accuracy of the model. This method is much less time and resource-intensive than creating a new model as the initial training process requires hundreds of thousands of labelled radiograph studies.⁸ Our fine-tuned model could potentially be useful in the emergency department for worklist prioritization but could also be applicable to inpatient and outpatient radiology.

Supplemental Material

Supplemental Material - Machine Learning Model for Chest Radiographs: Using Local Data to Enhance Performance

Supplemental Material for Machine Learning Model for Chest Radiographs: Using Local Data to Enhance Performance by Sarah F. Mohn, Marco Law, Maria Koleva, Brian Lee, Adam Berg, Nicolas Murray, Savvas Nicolaou, and William A. Parker in Canadian Association of Radiologists Journal

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Summary Statement

Fine-tuning a machine learning model with a relatively small amount of local site imaging data is a feasible method for adapting machine learning models for use at individual institutions.

ORCID iD

Sarah F. Mohn

Supplemental Material

Supplemental material for this article is available online.

References

Seah

JCY

Tang

CHM

Buchlak

, et al. Effect of a comprehensive deep-learning model on the accuracy of chest x-ray interpretation by radiologists: a retrospective, multireader multicase study. Lancet. 2021;3(8):496-506. doi:10.1016/S2589-7500(21)00106-0.

Jalal

Parker

Ferguson

Nicolaou

. Exploring the role of artificial intelligence in an emergency and trauma radiology department. Can Assoc Radiol J. 2020;72(1):167-174. doi:10.1177/0846537120918338.

van der Pol

Patlas

. Canadian Radiology in the age of artificial intelligence: A golden opportunity. Can Assoc Radiol J. 2020;71(2):127-128. doi:10.1177/0846537/20907507.

Choy

Khalilzadeh

Michalski

, et al. Current applications and future impacts of machine learning in radiology. Radiology. 2018;288(2):318-328. doi:10.1148/radiol.2018171820.

Sogani

Allen

Jr Dreyer

McGinty

. Artificial intelligence in radiology: the ecosystem essential to improving patient care. Clin Imag. 2020;59(1):A3-A6. doi:10.1016/j.clinimag.2019.08.001.

Tang

Tam

Cadrin-Chenevert

, et al. Canadian Association of Radiologists White Paper on Artificial Intelligence in Radiology. Can Assoc Radiol J. 2018;69(2):120-135. doi:10.1016/j.carj.2018.02.002.

Eche

Schwartz

Mokrane

Dercle

. Toward generalizability in the deployment of artificial intelligence in radiology: Role of computation stress testing to overcome underspecification. Radiology Artificial Intelligence. 2021;3(6):e210097. doi:10.1148/ryai.2021210097.

Rauschecker

Gleason

Nedelec

, et al. Interinstitutional portability of a deep learning brain MRI lesion segmentation algorithm. Radiology Artificial Intelligence. 2021;4(1):e200152. doi:10.1148/ryai.2021200152.

Mohajer

Eng

. External validation of deep learning algorithms for radiologic diagnosis: A systematic review. Radiology Artificial Intelligence. 2022;4(3):e210064. doi:10.1148/ryai.210064.

10.

Willemink

Koszek

Hardell

, et al. Preparing medical imaging data for machine learning. Radiology. 2020;295(1):4-15. doi:10.1148/radiol.2020192224.

11.

Candemir

Nguyen

Folio

Prevedello

. Training strategies for radiology deep learning models in data-limited scenarios. Radiology Artificial Intelligence. 2021;3(6):e210014. doi:10.1148/ryai.2021210014.

12.

Retson

Besser

Sall

Golden

Hsiao

. Machine learning and deep neural networks in thoracic and cardiovascular imaging. J Thorac Imag. 2019;34(3):192-201. doi:10.1097/RTI.0000000000000385.

13.

Rubin

Sanghavi

Zhao

, et al. Large scale automated reading of frontal and lateral chest x-rays using dual convolutional neural networks [Electronic]. Arxiv Preprint: 1804.07839; 2018. doi:10.48550/arXiv.1804.07839.

14.

Paszke

Gross

Massa

, et al. PyTorch: An imperative style, high-performance deep learning library. Adv Neural Inf Process Syst. 2019;32:01703. doi:10.48550/arXiv.1912.01703.

15.

Xie

Girshick

Dollar

, et al. Aggregated residual transformations for deep neural networks [Electronic]. Arxiv Preprint: 1611.05431; 2017. doi:10.48550/arXiv.1611.05431.

16.

Huang

Liu

van der Maaten

Weinberger

. Densely connected convolutional networks [Electronic]. Arxiv Preprint: 1608.06993; 2017. doi:10.48550/arXiv.1608.06993.

17.

Sandler

Howard

Zhu

, et al.

MobileNetV2: Inverted residuals and linear bottlenecks [Electronic].

Arxiv Preprint: 1801.04381; 2018. doi:10.48550/arXiv.1801.04381.

18.

Johnson

AEW

Pollard

Berkowitz

, et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data. 2019;6(1):317. doi:10.1038/s41597-019-0322-0.

19.

Irvin

Rajpurkar

, et al. CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park, CA: Association for the Advancement of Artificial Intelligence; 2019:590-597. doi:10.1609/aaai.v33i01.3301590.

20.

Deng

Dong

Socher

, et al. ImageNet: A large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition. Piscataway, NJ: IEEE; 2009:248-255. doi:10.1109/CVPR.2009.5206848.

21.

Pham

Tran

Ngo

Nguyen

. Interpreting chest X-rays via CNNs that exploit hierarchical disease dependencies and uncertainty labels. Neurocomputing. 2021;437:186-194. doi:10.1016/j.neucom.2020.03.127.

22.

Selvaraju

Cogswell

Abhishek

Vedantam

Parikh

Batra

. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int J Comput Vis. 2019;128(2):336-359. doi:10.1007/s11263-019-01228-7.

23.

De Long

DeLong

Clark-Pearson

. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics. 1988;44(3):837-845. doi:10.2307/2531595.

24.

Demsar

. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res. 2006;7:1-30.

25.

Virtanen

Gommers

Oliphant

Babyn

. SciPy 1.0 - Fundamental algorithms for scientific computing in python. Nat Methods. 2020;17(3):261-272. doi:10.1038/s41592-019-0686-2.

26.

Adams

Henderson

RDE

, et al. Artificial Intelligence Solutions for Analysis of X-ray Images. Can Assoc Radiol J. 2020;72(1):60-72. doi:10.1177/0846537120941671.

27.

Tajbakhsh

Shin

Gurudu

, et al.

Convolutional neural networks for medical image analysis: full training or fine tuning?

IEEE Trans Med Imag. 2016;35(5):1299-1312. doi:10.1109/TMI.2016.2535302.

28.

Swati

ZNK

Zhao

Kabir

, et al. Brain tumor classification for MR images using transfer learning and fine-tuning. Comput Med Imag Graph. 2019;75:34-46. doi:10.1016/j.compmedimag.2019.05.001.

29.

Rauscheker

Gleason

Nedelec

, et al. Interinstitutional portability of a deep learning brain MRI lesion segmentation algorithm. Radiology: Artif Intell. 2021;4(1):e200152. doi:10.1148/ryai.2021200152.

30.

Kitamura

Deible

. Retraining an open-source pneumothorax detecting machine learning algorithm for improved performance to medical images. Cardiothoracic Imaging. 2020; 61: 15-19. doi: 10.1016/j.clinimag.2020.01.008.

31.

Annarumma

Withey

Bakewell

, et al. Automated triaging of adult chest radiographs with deep artificial neural networks. Thoracic Imaging. 2019;291(1):196-202. doi:10.1148/radiol.2019194005.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.31 MB