Multicenter validation of synthetic FLAIR as a substitute for FLAIR sequence in acute ischemic stroke

Abstract

Purpose:

To evaluate performance of synthetic and real FLAIR for identifying early stroke in a multicenter cohort.

Methods:

This retrospective study was conducted using DWI and FLAIR extracted from the Endovascular Treatment in Ischemic Stroke image registry (2017–2021). The database was partitioned into subsets according to MRI field strength and manufacturer, and randomly divided into training set (70%) used for model fine-tuning, validation set (15%), and test set (15%). In test set, five readers, blinded to FLAIR sequence type, assessed DWI-FLAIR mismatch using real and synthetic FLAIR. Interobserver agreement for DWI-FLAIR rating and concordance between synthetic and real FLAIR were evaluated with kappa statistics. Sensitivity and specificity for identification of ⩽4.5 h AIS were compared in patients with known onset-to-MRI delay using McNemar’s test.

Results:

1454 complete MRI sets (1172 patients, median (IQR) age: 73 years (62–82); 762 women) acquired on 125 MRI units were analyzed. In test set (207 MRI), interobserver reproducibility for DWI-FLAIR mismatch labeling was substantial for real and synthetic FLAIR (Fleiss κ = 0.79 (95%CI: 0.73–0.84) and 0.77 (95%CI: 0.71–0.82), respectively). After consensus, concordance between real and synthetic FLAIR was excellent (κ = 0.85 (95%CI: 0.78–0.92)). In 141 MRI sets with known onset-to-MRI delay, diagnostic performances for ⩽4.5 h AIS identification did not differ between real and synthetic FLAIR (sensitivity: 60/71 (85%) vs 59/71 (83%), p = .56; specificity: 65/70 (93%) vs 65/70 (93%), p > 0.99).

Conclusion:

A deep-learning-based FLAIR fine-tuned on multicenter data can provide comparable performances to real FLAIR for early AIS identification. This approach may help reducing MR protocol duration and motion artifacts.

Graphical abstract

Keywords

Acute stroke ischemic stroke DWI-FLAIR mismatch synthetic FLAIR

Introduction

In acute ischemic stroke (AIS), recanalization treatment decision either by intravenous thrombolysis or endovascular therapy, is highly impacted by imaging data and time constraints.^1–3 While treatment window has expanded, notably to late stroke patients,^4–6 imaging has emerged as a cornerstone to assess collaterals^7,8 and identify potentially salvageable brain parenchyma beyond pre-defined timeframes.⁹ In this setting and given the variability of stroke growth, optimizing the imaging workflow remains crucial for providing the best-informed clinical decision support in the shortest possible time.¹⁰ Depending on available modality, CT or MRI scan are typically performed before treatment decision, leading to comparable treatment delays in known-onset stroke.¹¹

In up to 27% of AIS,¹² imaging is a used for the stroke onset time estimation when unknown, such as in wake-up and unwitnessed strokes. In these clinical situations, MRI may estimate the symptom onset by assessing the DWI-FLAIR mismatch, that is, the presence of a diffusion restriction on the DWI without any significant signal change on the FLAIR sequence.^13,14

A previous, single-center, study showed that a synthetic FLAIR sequence (hereafter referred as synthFLAIR) could be computed based on DWI sequence and be as clinically relevant as real FLAIR (realFLAIR) sequence to assess this mismatch pattern. Indeed, the T2-weighted acquisition embedded in the DWI sequence before application of diffusion gradients (b = 0 s/mm² (b0)) has been shown to contain information about stroke FLAIR visibility, but its analysis is limited in cortical regions where cerebrospinal fluid intensity is high.¹⁵ The synthFLAIR sequence, whose calculation relies on these signal changes, should in contrast keep a good diagnostic value near cortical areas. Furthermore, its use should allow to reduce the MR protocol time by avoiding realFLAIR acquisition, and be used as an alternative to realFLAIR in case of motion artifacts in restless patients.¹⁶ However, the original synthFLAIR model was developed on a homogeneous single-center dataset from a single MRI unit, and its generalizability on images issued from different MRI vendors, magnetic fields and variable sequence parameters is unknown.¹⁷ For large-scale application, the synthFLAIR model must be adapted and validated in a multicentric environment to overcome a potential domain shift.¹⁸ Multicenter-level studies are now part of AI recommendations driven by the FDA¹⁹ and guideline initiatives, such as the CLAIM (Checklist for Artificial Intelligence in Medical Imaging),²⁰ because of potential training bias and confounding factors that may impede future broader utilization. This is even more sensitive regarding some unsupervised generative deep-learning models that can be inflected by the data distribution during the training phase, as provocatively demonstrated by some produced hallucinated images.²¹

Here, we used a fine-tuning procedure to adapt the synthFLAIR model for DWI data acquired with different manufacturers and field strengths, in order to make it compatible with DWI data issued from any manufacturers at either 1.5 or 3 T. The aim of the study was to compare the diagnostic value of the new fine-tuned synthFLAIR to the realFLAIR sequence for DWI-FLAIR mismatch assessment and identification of stroke patients within 4.5 h from symptom onset in a national multicenter cohort.

Materials and methods

Data source

This retrospective study included adult patients who underwent recanalization treatment for AIS enrolled in the prospective multicenter observational Endovascular Treatment in Ischemic Stroke (ETIS) registry (ClinicalTrials Identifier: NCT03776877, approved by ethics committee ID-RCB number: 2017-A03457-46) between 2017 and December 2021. Written informed consent was obtained, and data collection and analysis were approved by ETIS review board. ETIS is a clinical multicenter registry including patients who present an acute stroke due to a large vessel occlusion with indication for mechanical thrombectomy, aged 18 years or older, with or without intravenous thrombolysis treatment.²² ETIS-image is an associated daughter image database collecting MR and CT imaging performed in these patients and in the subgroup of centers transmitting raw image data. Inclusion criteria in our study were: (1) availability of baseline MRI acquired before treatment and/or at early follow-up; (2) availability of paired FLAIR and DWI sequences with low and high b-values (hereafter referred to as b_low and b_high). Clinical data including age, sex, National Institutes of Health Stroke Scale (NIHSS) score at admission, recanalization treatment, and stroke onset-to-MRI delay were also collected.

Selection of data subsets and stratified data partition

FLAIR and DWI sequences were evaluated by one reader (resident, G.Ha.) on an ordinal scale ranging from 1 (low quality) to 3 (high quality) and inadequate MRI quality sets due to major artifacts were excluded. MRI datasets were partitioned into subsets according to the MRI field strength (1.5 or 3 Tesla) and manufacturers (corresponding to General Electric Healthcare, Siemens Healthineers and Philips Healthcare, hereafter respectively referred as manufacturers 1, 2, and 3). A specific subset was used for DWI sequences from one MRI unit with DWI acquired with a b_low value of 50 s/mm² (rather than 0 s/mm² in all other subsets) issued from manufacturer 2 (Supplemental Table 1). MRI sets were randomly divided into train (70%), validation (15%), and test (15%) sets with stratified randomization on each of seven subsets, MRI time (baseline or follow-up), and FLAIR quality (1, 2, or 3). Each MRI was assigned to one of the 42 (7 × 2 × 3) stratification groups depending on these three variables, and stratified splitting was performed using dedicated function from scikit-learn library.²³ In order to fulfill independence assumption for statistical tests in the test set, follow-up MRIs were excluded from analysis if the same patient was included twice in the test set.

Data preprocessing

RealFLAIR sequences were co-registered onto the corresponding DWI data using a 6-parameter rigid registration using Advanced Normalization Tools version 2.3.5 (https://stnava.github.io/ANTs). All MRI sets were either up-scaled or down-scaled into a standard 256 × 256 squared matrix size after signal normalization. Additional preprocessing steps, including data augmentation, were performed during training only according to previously published pipeline¹⁶ (Supplemental Methods 1).

Deep-learning model update and domain adaptation

The original synthFLAIR model¹⁶ (“Vanilla model”) was adapted to require only DWI source images as input data (i.e. without apparent diffusion coefficient (ADC) map; Supplemental Methods 2 and Supplemental Figure 1). In addition, the updated model was fine-tuned for each subset of the database. We favored this supervised domain adaptation strategy after an exploratory analysis using supervised²⁴ and unsupervised²⁵ methods evaluated on the validation set (Supplemental Methods 3). Source code and model weights are freely available on http://github.com/NeuroSainteAnne/synthFLAIR. After training, synthFLAIR were generated in the test set by applying each fine-tuned model on DWI source images.

Image analysis

For DWI-FLAIR mismatch assessment, DWI images and either realFLAIR or synthFLAIR were presented in a random order to four neuroradiologists (G.Hm., J.B., L.L., and C.O.), with respectively, 6, 4, 11, and 21 years of experience in stroke imaging, and one resident (G.Ha.). Readers were blinded to data subset, FLAIR sequence type, and onset-to-MRI delay. FLAIR lesion was categorized as not visible (i.e. presence of DWI-FLAIR mismatch), visible (i.e. absence of DWI-FLAIR mismatch), or not assessable (because of extensive white matter disease or artifacts), following the Wake-Up Stroke trial specifications.²⁶ One reader (G.Ha.) repeated the procedure for intraobserver reproducibility assessment after a 2-month washout period. Discrepancies between readers were resolved by consensus, either automatically when a majority agreement was reached (i.e. >3 readers among five assigned the same rating), or after agreement of two senior readers for dubious cases.

Besides the visual analysis, the ratio of signal intensity (rSI) corresponding to the relative signal intensity of the ischemic lesion to the contralateral signal intensity¹⁶ computed on both realFLAIR and synthFLAIR was used to assess FLAIR status and detect ⩽4.5 h AIS.

Statistical analysis

Statistical analyses were performed with open-source software (R, version 4.0.1; R Foundation). Inter-observer agreement for DWI-FLAIR mismatch rating between realFLAIR and synthFLAIR of the five readers was assessed using the Fleiss Kappa (κ) coefficient. Intra-observer reproducibility for DWI-FLAIR mismatch assessment and concordance between realFLAIR and synthFLAIR were evaluated with the Cohen Kappa coefficient. Sensitivity, specificity, positive and negative predictive values of DWI-FLAIR mismatch for the identification of ⩽4.5 h AIS were compared between realFLAIR and synthFLAIR using McNemar’s test and the relative predictive value method.²⁷ The rSIs were compared between realFLAIR and synthFLAIR using Pearson correlation coefficients. Areas under the receiver operating characteristic curve (AUCs) for identifying ⩽4.5 h AIS were computed using rSI in stroke patients with known onset-to-MRI delay, and AUC comparison was performed using DeLong’s method. Subgroup analysis was additionally performed in the 2–9 h target window, and subgroup analyses were also performed at the subset level. Given that DWI obtained with b_low = 50 s/mm² might not contain complete T2-weighted information, we performed a subgroup analysis in all subsets with b_low = 0 s/mm². Additional post-hoc analysis for interobserver DWI-FLAIR mismatch assessment was performed in restless patients excluded from initial data partition for inadequate FLAIR quality due to major artifacts. Values are expressed with interquartile range (IQR) and/or 95%CIs. The statistical significance threshold was p < 0.05.

Results

Patients and MRI set characteristics

In total, 1490 complete MRI sets were screened. After exclusion of 27 low-quality datasets (including 16 with inadequate FLAIR quality; Figure 1), 1463 MRI sets from 1172 patients (762 women; median age: 73 years (IQR, 62–82)) acquired from 125 different MRI units were included. After data splitting, nine follow-up MRIs issued from the same subject included twice in the test set were excluded, leading to a final number of 1454 analyzed MRIs, of which 1023 (70%) were used for training, 224 (15%) for validation and 207 (15%) for testing. Among these MRI sets, 1013 (70%) were acquired before treatment and 441 (30%) at early follow-up. Clinical data are summarized in Table 1. MRI units and DWI and FLAIR sequence parameters in the seven subsets are reported in Supplemental Table 1.

Figure 1.

Flow chart for MRI set and patient inclusion.

Table 1.

Data sets and patient characteristics.

Variable	Training set	Validation set	Test set
No. of MRI sets	1023 (70)	224 (15)	207 (15)
No. of MRI sets with known onset-to-MRI delay	723 (71)	157 (70)	143 (70)
Onset-to-MRI delay (h), per MRI set,*	3.75 (2–28)	3.6 (2–26)	4.2 (1.9–29.3)
No. of patients	885	218	207
Age (y)	71 (59–83)	69 (59–97)	70 (60–98)
No. of women	524 (51)	122 (54)	116 (54)
NIHSS score at admission*	13.5 (7–19)	14 (9–20)	14 (7–20)
Intravenous thrombolysis^†	403 (52)	95 (54)	86 (50)
Mechanical thrombectomy^‡	667 (89)	151 (88)	164 (96)
No. of MRI sets in each subset
Subset A: Manufacturer 1, 1.5 T	242 (24)	53 (24)	48 (23)
Subset B: Manufacturer 2, 1.5 T b_low = 0 s/mm²	375 (37)	81 (36)	77 (37)
Subset C: Manufacturer 2, 1.5 T b_low = 50 s/mm²	105 (10)	23 (10)	21 (10)
Subset D: Manufacturer 3, 1.5 T	67 (6)	15 (7)	13 (6)
Subset E: Manufacturer 1, 3 T	62 (6)	14 (6)	12 (9)
Subset F: Manufacturer 2, 3 T	92 (9)	20 (9)	19 (9)
Subset G: Manufacturer 3, 3 T	80 (8)	18 (8)	17 (8)

NIHSS: National Institutes of Health Stroke Scale.

Values are expressed as numbers of patients with percentages in parentheses, unless otherwise specified.

Data are expressed as median with interquartile range in parentheses. NIHSS data were missing in respectively 46 (5%), 16 (7%), and 9 (4%) patients in training, validation, and test sets.

†

Missing data in respectively 114, 43, and 34 patients.

‡

Missing data in respectively 135, 47, and 37 patients.

Reproducibility of DWI-FLAIR mismatch assessment

Intraobserver reproducibility was not statistically different between realFLAIR and synthFLAIR (κ = 0.82 (95%CI: 0.74–0.90) and 0.75 (0.66–0.84), respectively, p = 0.27). Interobserver reproducibility was substantial for realFLAIR and synthFLAIR sequence and not significantly different (κ = 0.79 (95%CI: 0.73–0.84) and 0.77 (0.71–0.82), respectively, p = 0.58; Table 2).

Table 2.

Interobserver reproducibility for DWI-FLAIR mismatch assessment between the five readers.

Test set	realFLAIR	synthFLAIR	p Value
Subset A: Manufacturer 1, 1.5 T (n = 48)	0.71 (0.59, 0.83)	0.70 (0.57, 0.83)	0.904
Subset B: Manufacturer 2, 1.5 T b_low = 0 s/mm² (n = 77)	0.83 (0.74, 0.92)	0.80 (0.71, 0.89)	0.655
Subset C: Manufacturer 2, 1.5 T b_low = 50 s/mm² (n = 21)	0.80 (0.62, 0.99)	0.58 (0.31, 0.85)	0.219
Subset D: Manufacturer 3, 1.5 T (n = 13)	0.59 (0.26, 0.92)	0.73 (0.42, 1.00)	0.485
Subset E: Manufacturer 1, 3 T (n = 12)	0.65 (0.36, 0.93)	0.76 (0.50, 1.00)	0.494
Subset F: Manufacturer 2, 3 T (n = 19)	0.81 (0.57, 1.00)	0.77 (0.54, 1.00)	0.796
Subset G: Manufacturer 3, 3 T (n = 17)	0.94 (0.82, 1.00)	0.85 (0.67, 1.00)	0.427
All subsets except Subset C (n = 186)	0.78 (0.72, 0.84)	0.77 (0.71, 0.83)	0.89
All subsets (n = 207)	0.79 (0.73, 0.84)	0.77 (0.71, 0.82)	0.58

FLAIR: fluid-attenuated inversion recovery.

Values are expressed as κ Fleiss coefficient with 95%CI in parentheses.

The lowest interobserver reproducibility for synthFLAIR was obtained in Subset C (MRI sets with b_low = 50 s/mm²; κ = 0.80 (95%CI: 0.62–0.99) and 0.58 (95%CI: 0.31–0.85) respectively for realFLAIR and synthFLAIR, p = 0.22). In MRI sets with b_low = 0 s/mm² (n = 186), interobserver reproducibility was substantial for both realFLAIR and synthFLAIR sequence (κ = 0.78 (95%CI: 0.72–0.84) and κ = 0.77 (95%CI: 0.71–0.83), respectively, p = 0.89).

Concordance between realFLAIR and synthFLAIR for mismatch assessment

Depending on the reader, rating ranged from substantial to excellent (κ = 0.70–0.83; Table 3). After consensus, four MRI sets were considered non-assessable and were thus excluded from analysis. Dubious cases were resolved by consensus review in 14/203 (7%) realFLAIR and 19/203 (9%) synthFLAIR (p = 0.38).

Table 3.

Concordance of DWI-FLAIR mismatch assessment between realFLAIR and synthFLAIR.

Test set	Reader 1	Reader 2	Reader 3	Reader 4	Reader 5	After consensus
Subset A: Manufacturer 1, 1.5 T (n = 48)*	0.80	0.76	0.75	0.79	0.53	0.91 (0.79–1.00)
Subset B: Manufacturer 2, 1.5 T b_low = 0 s/mm² (n = 77)*	0.90	0.72	0.77	0.84	0.71	0.89 (0.79–0.99)
Subset C: Manufacturer 2, 1.5 T b_low = 50 s/mm² (n = 21)	0.53	0.67	0.59	0.39	0.49	0.63 (0.25–1.00)
Subset D: Manufacturer 3, 1.5 T (n = 13)*	0.71	0.37	0.52	0.37	0.59	0.47 (0.01–0.98)
Subset E: Manufacturer 1, 3 T (n = 12)	0.83	0.83	0.53	0.82	1.00	0.82 (0.5–1.00)
Subset F: Manufacturer 2, 3 T (n = 19)	0.87	0.87	0.88	1.00	0.89	0.87 (0.63–1.00)
Subset G: Manufacturer 3, 3 T (n = 17)	0.87	0.64	0.85	0.87	0.8	0.87 (0.61–1.00)
All subsets except Subset C (n = 186)*	0.85	0.72	0.75	0.82	0.71	0.87 (0.80–0.94)
All subsets (n = 207)*	0.83	0.73	0.75	0.79	0.70	0.85 (0.78–0.92)

Values are expressed as κ values with 95%CI in parentheses when applicable.

In subsets A, B, and D, respectively 2, 1, and 1 MRI sets were considered non-assessable and were thus excluded from analysis after consensus.

After consensus, concordance between realFLAIR and synthFLAIR on the 203 assessable MRI sets was excellent (κ = 0.85 (0.78–0.92)). Concordance was also excellent in the subgroup of 182 assessable MRI sets with b_low = 0 s/mm² (κ = 0.87 (95%CI: 0.80–0.94)). Illustrative examples are presented in Figure 2. Consensus assessment will be used in what follows.

Figure 2.

Diffusion-weighted imaging (DWI)–fluid-attenuated inversion recovery (FLAIR) mismatch assessment using acquired FLAIR sequence (realFLAIR) and synthetic FLAIR (synthFLAIR) in AIS. (a) DWI-FLAIR mismatch in a 69-year-old man (subset B). On 1.5 T DWI (b_high = 1000 s/mm²) obtained 2 h and 15 min from symptom onset, diffusion restriction is seen in the left middle cerebral artery territory without signal change on the realFLAIR and synthFLAIR. (b) DWI-FLAIR mismatch in a 59-year-old man (subset G). On a 3 T DWI (b_high = 1000 s/mm²) obtained 4 h from symptom onset, large diffusion restriction is seen in the right middle cerebral artery territory without significant signal change on the 3D realFLAIR and synthFLAIR. Note that the DWI based on EPI technique is prone to artifacts on the periphery, which results in less accurate frontal cortex delineation on synthFLAIR compared to realFLAIR. (c) Absence of DWI-FLAIR mismatch in a 54-year-old man (subset A). On 1.5 T DWI (b_high = 1000 s/mm²) obtained 6 h and 10 min from symptom onset, diffusion restriction is seen in the left middle cerebral artery territory, also visible on realFLAIR and synthFLAIR.

Identification of ⩽4.5 h AIS with qualitative and quantitative analysis

Stroke onset-to-MRI delay was known in 141 of 203 assessable MRI sets from the test set. Early stroke (⩽4.5 h from stroke onset) was classified accurately by DWI-FLAIR mismatch with 125/141 (89%) realFLAIR and 124/141 (88%) synthFLAIR sequences (p > 0.99). The sensitivity and specificity of the DWI-FLAIR mismatch for the identification of ⩽4.5 h AIS were not significantly different between realFLAIR and synthFLAIR (sensitivity: 60/71 (85%) vs 59/71 (83%), p = 0.56; specificity: 65/70 (93%) vs 65/70 (93%), p > 0.99; Table 4).

Table 4.

Comparison of the diagnostic value of DWI-FLAIR mismatch after consensus review between realFLAIR and synthFLAIR to estimate stroke onset time within 4.5 h.

Statistic	realFLAIR	synthFLAIR	p Value
Sensitivity	60/71 (85)	59/71 (83)	0.56
Specificity	65/70 (93)	65/70 (93)	>0.99
Positive predictive value	60/65 (92)	59/64 (92)	0.97
Negative predictive value	65/76 (86)	65/77 (84)	0.57

Diagnostic value was computed in the 141 MRI datasets when stroke onset-to-MRI delay was available. Data are expressed as number of MRI sets, with corresponding percentages in parentheses. Sensitivity and specificity were compared using the McNemar test. Predictive values were compared using the relative predictive value method.

rSIs measured on realFLAIR and synthFLAIR were highly correlated (Pearson r = 0.83 (95%CI: 0.78–0.87)). Pearson coefficient ranged from 0.77 (subset C, MRI sets with b_low = 50 s/mm²) to 0.92 (subset D) and was equal to 0.83 (95%CI: 0.78–0.87) in the subgroup of all MRI sets with b_low = 0 s/mm².

AUCs using rSI for identification of ⩽4.5 h AIS were not significantly different between realFLAIR and synthFLAIR (respectively 0.90 (95%CI: 0.85–0.96) and 0.86 (95%CI: 0.84–0.95), p = 0.85), nor were they significantly different in the subgroup of MRI sets with b_low = 0 s/mm² (respectively 0.89 (95%CI: 0.83–0.95) and 0.91 (95%CI: 0.86–0.97), p = 0.60).

Subgroups of onset-to-MRI delays in the 2–9 h target window

Among 141 MRI sets where onset-to-MRI delay was known, 40 (28%) were performed in the 2–9 h target window. Interobserver reproducibility was moderate for both realFLAIR and synthFLAIR sequence (Fleiss κ = 0.69 (95%CI: 0.53–0.84) and 0.65 (0.49–0.81), respectively, p = 0.73). After consensus, concordance between realFLAIR and synthFLAIR was substantial (κ = 0.75 (0.53–0.98)). DWI-FLAIR mismatch was present in 31/40 (78%) realFLAIR and 27/40 (68%) synthFLAIR sequences (p = 0.13). Both sequences had identical accuracy for classifying stroke delay (⩽4.5 or >4.5 h from stroke onset; 31/40, 77%). The sensitivity and specificity of the DWI-FLAIR mismatch for the identification of ⩽4.5 h AIS were not significantly different between realFLAIR and synthFLAIR (sensitivity: 27/32 (84%) vs 25/32 (78%), p = 0.16; specificity: 4/8 (50%) vs 6/8 (75%), p = 0.16). The 4/40 (10%) discordant labelings corresponded to subjects labeled as DWI-FLAIR mismatch using realFLAIR and no mismatch using synthFLAIR; two subjects had early stroke (2 and 3.25-h) and two subjects had late stroke (4.7 and 6.5-h).

Post-hoc analysis in restless patients

In 16 MRI sets excluded from data partition and main analysis (inadequate FLAIR quality because of major artifacts), interobserver reproducibility for mismatch assessment on synthFLAIR sequence between the five readers was substantial (κ = 0.79 (95%CI: 0.59–0.98)). Illustrative cases are shown in Figure 3.

Figure 3.

Diffusion-weighted imaging (DWI)–fluid-attenuated inversion recovery (FLAIR) mismatch assessment in restless patients. (a) DWI-FLAIR mismatch assessable on the synthFLAIR generated from 1.5 T DWI data in a 73-year-old woman 2 h after symptom onset (subset D). On DWI, a slight diffusion restriction is seen in the right middle artery territory without signal change on the synthFLAIR. The realFLAIR sequence was excluded from main analysis due to artifacts whereas the synthFLAIR was of diagnostic value. (b) AIS with hemorrhagic transformation in a 59-year-old woman (subset A). The realFLAIR sequence acquired with a 1.5 T MRI presented with severe motion artifacts and was excluded from main analysis. Note the absence of these artifacts on the synthFLAIR sequence.

Discussion

We have demonstrated that the original synthFLAIR model trained on homogenous single-center data could be adapted to compute clinically relevant synthFLAIR sequence on a multicenter cohort using a fine-tuning procedure.

To our knowledge, our study is the first to evaluate the adaptation of synthFLAIR in a large multicentric cohort with various manufacturers, and to propose a technical approach for this adaptation. Supervised deep-learning models’ generalizability is indeed a controversial topic, as evidenced by the recent literature, which has raised concerns about the reliability of models when faced with new heterogeneous target domains in medical imaging.²⁸ Our work suggests the feasibility of adapting a pre-trained model using a specific supervised domain adaptation method to overcome field strength and manufacturer shift from multicenter MRI data. To achieve such a change of scale, we first adjusted its architecture and discarded the ADC map as input. Indeed, the ADC map computed from the native DWI source image introduced signal variability by adding noise that likely affected multi-site translation without providing information relevant to the model’s purpose (see Supplemental Methods 2).

More than 100 comprehensive and primary stroke centers, using both 1.5 and 3 T MRI units from three different manufacturers participated in recruiting stroke patients and acquiring MRI data in this study. Such a broad aggregate leads to data heterogeneity beyond manufacturer and field strength, including MRI model subtypes and variability in imaging protocol and acquisition parameters either on DWI or FLAIR sequences. These variations faithfully reflect the daily clinical workflow and ensure real-world training conditions, thus allowing a widespread applicability without the need to re-train models against other MRI units at a later stage, thanks to the variety of MRI units initially included within each subset. Moreover, we purposely kept all diffusion data, including DWI with ⩾3 gradient-encoding directions and b-value variations, without considering those variations as specific domains (except for the b_low variations). This data heterogeneity and these model development strategies facilitate clinical portability across any 1.5 or 3 T MRI units from three main manufacturers. Each fine-tuned model will be made available as open-source software on http://github.com/NeuroSainteAnne/synthFLAIR, in order to facilitate external validation of our technique by individual teams with different MRI units.

Within the subgroup of DWI data with b_low = 50 s/mm², the fine-tuned model presented the lowest interobserver reproducibility for DWI-FLAIR mismatch assessment as well as the lowest Pearson coefficient of the rSI compared to other subsets. This finding reinforces the underlying hypothesis that the synthFLAIR is mainly driven by the T2 contrast yielded by b_low = 0 s/mm² images¹⁵ and explains lower performances with increasing b_low values.

Subgroup analysis in the 2–9 h target window subgroup did not show any differences in interobserver reproducibility and diagnostic accuracy between realFLAIR and synthFLAIR. Despite the lack of statistical significance, synthFLAIR tended to be more “conservative” than realFLAIR in this subgroup analysis, since the four labeling discrepancies in this subgroup could have led to avoid thrombolysis using synthFLAIR mismatch definition. It should however be noted that two of these four discrepancies were justified (no DWI-FLAIR mismatch using synthFLAIR for subjects with >4.5-h onset-to-MRI delay). As a consequence, if this “conservative” feature is confirmed, synthFLAIR may be “safe” to use (reducing the risk to perform thrombolysis on late stroke and thus reducing iatrogenic hemorrhagic risk) at the expense of the number of treated patients. Until further research confirming or infirming this result, it seems thus acceptable to perform synthFLAIR only in situations where realFLAIR is deemed uninformative (restless patients).

Our initial domain definitions may be questionable as the subsets we selected, based on the MR field strengths and manufacturers, gathered very heterogeneous data, which may have led to a greater variety in the distribution of data than one would expect from a single domain as defined by the framework of the computer vision model. Data partition based on sequence parameters would have been useful to increase data homogeneity in each subset but using smaller groups would have increased the risk of overfitting,²⁹ especially using the fine-tuning strategy.

One of the potential strengths of synthFLAIR is its ability to overcome motion artifacts in restless patients, minimizing artifacts given the short acquisition time of DWI. This is supported by the substantial interobserver reproducibility for DWI-synthFLAIR mismatch evaluation in the post-hoc analysis of restless patients and could have a major impact on clinical practice. However, DWI, and by extension synthFLAIR, can be prone to other artifacts, including geometric distortions and susceptibility artifacts associated with EPI techniques particularly near skull base and temporal lobes.

Our study has several limitations. First, the number of MRI sets for each manufacturer and field strength was unbalanced in the seven different subsets. Its impact on each model’s performance after application of the domain adaptation technique may be difficult to apprehend. However, the smallest group (subset E), trained on only 62 subjects, reached relatively good performances as compared to other groups (realFLAIR–synthFLAIR concordance κ = 0.83). The number of MRI sets required for fine-tuning the synthFLAIR model may thus be limited. Preliminary ablation study seems to point that at least 50 MRIs may be required for this fine-tuning (see Supplemental Methods 3d).

Second, we chose to include both early and follow-up imaging in our study, even if clinical challenges and time constraints are very different in these two situations. This was performed in line with the reference study,¹⁶ in order to increase data diversity for model training and hence improve model generalizability.³⁰ Moreover, if reducing acquisition time is not crucial for follow-up imaging, synthFLAIR could still be a supplementary tool in restless patients presenting with kinetic artifacts on realFLAIR sequences. This approach poses however the question of patients with two MRI in the dataset. From a learning standpoint, considering early and late imaging as independent seems acceptable. Indeed, given the important differences in MR signal, acquisition plane, and image orientation between early and late acquisitions, it seems unlikely that the model could learn patient-specific brain morphology to generate the synthFLAIR signal, particularly since the model is trained on a slice-wise basis. From a statistical standpoint, we removed follow-up imaging from the test set when the patient had also an early imaging, in order to account for statistical independence assumptions. In this study, the analysis based on the rSI showed that the AUC for the identification of ⩽4.5 h AIS on synthFLAIR tended to be lower than with the realFLAIR, without reaching statistical significance. The AUC difference between realFLAIR and synthFLAIR was however smaller in the validation dataset (with AUCs respectively equal to 0.85 and 0.84, see Supplemental Table 2, as compared to 0.90 and 0.86 in the test set), suggesting either some overfitting on the validation dataset or a variation due to data sampling. Moreover, quantitative evaluation of the rSI on the FLAIR sequence may represent an additional tool for treatment decisions, but cutoff values vary among studies^14,31–34 and this parameter may not yet replace visual rating for DWI-FLAIR mismatch status in clinical practice.^35,36

As the duration of realFLAIR acquisition was not available for all exams, the impact of accelerating the diagnosis process with synthFLAIR cannot be as clearly assessable as in a single-center study.¹⁶ Due to its retrospective design, the impact of synthFLAIR on patient outcomes and management strategies could not be fully assessed beyond the potential expected benefits in time reduction acquisition in this study. Our results cannot be extrapolated to stroke mimics,³⁷ as we only included AIS patients. Further study still needs to be done to extend this synthFLAIR sequence to other pathologies in the setting of suspected AIS, although time management may be less decisive in those clinical situations.

Research perspectives could also include the development of a multi-task model that, beyond generating a synthFLAIR sequence from the DWI, would also predict DWI-FLAIR mismatch status³⁸ to enhance decision-making.

In conclusion, a single-center generative pre-trained model, fine-tuned across DWI data from different MRI manufacturers and field strengths can generate clinically relevant synthFLAIR that can compete with realFLAIR to assess DWI-FLAIR mismatch and identify early AIS at a multicenter scale. Beyond reduction time of the stroke MR protocol without the prior need for a real FLAIR sequence acquisition, synthFLAIR may be a promising alternative to overcome motion artifacts in restless patients at the acute phase of stroke.

Supplemental Material

sj-docx-1-eso-10.1177_23969873241263418 – Supplemental material for Multicenter validation of synthetic FLAIR as a substitute for FLAIR sequence in acute ischemic stroke

Supplemental material, sj-docx-1-eso-10.1177_23969873241263418 for Multicenter validation of synthetic FLAIR as a substitute for FLAIR sequence in acute ischemic stroke by Guillaume Hamon, Laurence Legrand, Ghazi Hmeydia, Guillaume Turc, Wagih Ben Hassen, Sylvain Charron, Clement Debacker, Olivier Naggara, Bertrand Thirion, Bailiang Chen, Bertrand Lapergue, Catherine Oppenheim and Joseph Benzakoun in European Stroke Journal

Footnotes

Appendix

Acknowledgements

ETIS Registry Investigators (a list of the ETIS Investigators is given in the ).

Abbreviations

AIS = acute ischemic stroke

AUC = Area under the receiver operating characteristic curve

DWI = Diffusion-weighted imaging

FLAIR = Fluid-attenuated inversion recovery

realFLAIR = real FLAIR

synthFLAIR = synthetic FLAIR

rSI = ratio of signal intensity

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by a government grant managed by the French National Research Agency (ANR) as part of the future investment program integrated into France 2030, under grant agreement No. ANR-18-RHUS-0001.

Ethical approval

This study was performed in line with the principles of the Declaration of Helsinki. The study was conducted under the Reference Methodology MR-004 for data protection relating to the processing of retrospective and prospective personal data implemented in the framework of research not involving the human person and approved by our clinical research committee.

Informed consent

Written informed consent was obtained, and data collection and analysis were approved by ETIS review board.

Guarantor

CO and JB.

Contributorship

All authors reviewed and edited the manuscript and approved the final version of the manuscript.

Data sharing statement

Data analyzed during the study were provided by a third party. Requests for data should be directed to the provider indicated in the Acknowledgments.

ORCID iD

Guillaume Hamon

Guillaume Turc

Supplemental material

Supplemental material for this article is available online.

References

Powers

Rabinstein

Ackerson

, et al. Guidelines for the early management of patients with acute ischemic stroke: 2019 update to the 2018 guidelines for the early management of acute ischemic stroke: a guideline for healthcare professionals from the American Heart Association/American Stroke Association. Stroke 2019; 50: e344–e418.

Thomalla

Simonsen

Boutitie

, et al. MRI-guided thrombolysis for stroke with unknown time of onset. N Engl J Med 2018; 379: 611–622.

Escalard

Gory

Kyheng

, et al. Unknown-onset strokes with anterior circulation occlusion treated by thrombectomy after DWI-FLAIR mismatch selection. Eur J Neurol 2018; 25: 732–738.

Nogueira

Jadhav

Haussen

, et al. Thrombectomy 6 to 24 hours after stroke with a mismatch between deficit and infarct. N Engl J Med 2018; 378: 11–21.

Albers

Marks

Kemp

, et al. Thrombectomy for stroke at 6 to 16 hours with selection by perfusion imaging. N Engl J Med 2018; 378: 708–718.

Campbell

BCV

Parsons

, et al. Thrombolysis guided by perfusion imaging up to 9 hours after onset of stroke. N Engl J Med 2019; 380: 1795–1803.

Rao

Mlynash

Christensen

, et al. Collateral status contributes to differences between observed and predicted 24-h infarct volumes in DEFUSE 3. J Cereb Blood Flow Metab 2020; 40: 1966–1974.

Vagal

Aviv

Sucharew

, et al. Collateral clock is more important than time clock for tissue fate. Stroke 2018; 49: 2102–2107.

Puig

Shankar

Liebeskind

, et al. From “time is brain” to “imaging is brain”: a paradigm shift in the management of acute ischemic stroke. J Neuroimaging 2020; 30: 562–571.

10.

Saver

JL.

Time is brain—quantified. Stroke 2006; 37: 263–266.

11.

Provost

Soudant

Legrand

, et al. Magnetic resonance imaging or computed tomography before treatment in acute ischemic stroke. Stroke 2019; 50: 659–664.

12.

Fink

Kumar

Horkan

, et al. The stroke patient who woke up: clinical and radiological features, including diffusion and perfusion MRI. Stroke 2002; 33: 988–993.

13.

Thomalla

Cheng

Ebinger

, et al. DWI-FLAIR mismatch for the identification of patients with acute ischaemic stroke within 4·5 h of symptom onset (PRE-FLAIR): a multicentre observational study. Lancet Neurol 2011; 10: 978–986.

14.

Petkova

Rodrigo

Lamy

, et al. MR imaging helps predict time from symptom onset in patients with acute stroke: implications for patients with unknown onset time. Radiology 2010; 257: 782–792.

15.

Geraldo

Berner

L-P

Haesebaert

, et al. Does b1000-b0 mismatch challenge diffusion-weighted imaging-fluid attenuated inversion recovery mismatch in stroke? Stroke 2016; 47: 877–881.

16.

Benzakoun

Deslys

M-A

Legrand

, et al. Synthetic FLAIR as a substitute for FLAIR sequence in acute ischemic stroke. Radiology 2022; 303: 153–159.

17.

Mohajer

Eng

External validation of deep learning algorithms for radiologic diagnosis: a systematic review. Radiol Artif Intell 2022; 4: e210064.

18.

Yan

Huang

Xia

, et al. MRI manufacturer shift and adaptation: increasing the generalizability of deep learning segmentation for MR images acquired with different scanners. Radiol Artif Intell 2020; 2: e190195.

19.

Health C for D and R. Artificial intelligence and machine learning in software as a medical device. FDA, https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device (2023, accessed 6 August 2023).

20.

Mongan

Moy

Kahn

CE.

Checklist for artificial intelligence in medical imaging (CLAIM): a guide for authors and reviewers. Radiol Artif Intell 2020; 2: e200029.

21.

Cohen

Luck

Honari

Distribution matching losses can hallucinate features in medical image translation. arXiv:180508841 [cs], http://arxiv.org/abs/1805.08841 (2018, accessed 8 February 2022).

22.

Hopital Foch. Evaluation of clinical and imaging criteria, and plasma biomarkers of patients receiving an endovascular treatment for an acute ischemic stroke. Clinical Trial Registration NCT03776877, clinicaltrials.gov, https://clinicaltrials.gov/study/NCT03776877 (2022, accessed 1 January 2024).

23.

Pedregosa

Varoquaux

Gramfort

, et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011; 12: 2825–2830.

24.

Cho

Shin

Freeze the discriminator: a simple baseline for fine-tuning GANs. arXiv:200210964 [cs, stat], http://arxiv.org/abs/2002.10964 (2020, accessed 10 February 2022).

25.

Zhu

J-Y

Park

Isola

, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv:170310593 [cs], http://arxiv.org/abs/1703.10593 (2020, accessed 2 February 2022).

26.

Thomalla

Fiebach

Østergaard

, et al. A multicenter, randomized, double-blind, placebo-controlled trial to test efficacy and safety of magnetic resonance imaging-based thrombolysis in wake-up stroke (WAKE-UP). Int J Stroke 2014; 9: 829–836.

27.

Moskowitz

Pepe

MS.

Comparing the predictive values of diagnostic tests: sample size and analysis for paired study designs. Clin Trials 2006; 3: 272–279.

28.

Zech

Badgeley

Liu

, et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med 2018; 15: e1002683.

29.

Eche

Schwartz

Mokrane

F-Z

, et al. Toward generalizability in the deployment of artificial intelligence in radiology: role of computation stress testing to overcome underspecification. Radiol Artif Intell 2021; 3: e210097.

30.

Gong

Zhong

Diversity in machine learning. IEEE Access 2019; 7: 64323–64350.

31.

Cheng

Boutitie

Nickel

, et al. Quantitative signal intensity in fluid-attenuated inversion recovery and treatment effect in the WAKE-UP trial. Stroke 2020; 51: 209–215.

32.

Legge

Graham

Male

, et al. Fluid-attenuated inversion recovery (FLAIR) signal intensity can identify stroke within 6 and 8 hours. J Stroke Cerebrovasc Dis 2017; 26: 1582–1587.

33.

Galinovic

Puig

Neeb

, et al. Visual and region of interest–based inter-rater agreement in the assessment of the diffusion-weighted imaging–fluid-attenuated inversion recovery mismatch. Stroke 2014; 45: 1170–1172.

34.

Song

Latour

Ritter

, et al. A pragmatic approach using magnetic resonance imaging to treat ischemic strokes of unknown onset time in a thrombolytic trial. Stroke 2012; 43: 2331–2335.

35.

Scheldeman

Wouters

Dupont

, et al. Diffusion-weighted imaging and fluid-attenuated inversion recovery quantification to predict diffusion-weighted imaging-fluid-attenuated inversion recovery mismatch status in ischemic stroke with unknown onset. Stroke 2022; 53: 1665–1673.

36.

Cheng

Brinkmann

Forkert

, et al. Quantitative measurements of relative fluid-attenuated inversion recovery (FLAIR) signal intensities in acute stroke for the prediction of time from symptom onset. J Cereb Blood Flow Metab 2013; 33: 76–84.

37.

Danière

Edjlali-Goujon

Mellerio

, et al. MR screening of candidates for thrombolysis: how to identify stroke mimics? J Neuroradiol 2014; 41: 283–295.

38.

Polson

Zhang

Nael

, et al. Identifying acute ischemic stroke patients within the thrombolytic treatment window using deep learning. J Neuroimaging 2022; 32: 1153–1160.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.70 MB