Abstract
Introduction
Ultrasound (US) is one of the most used imaging modalities in clinical practice for breast, abdominal, transrectal, intravascular US, and prenatal diagnosis, obstetrics and gynecology due to its relative safety with non-ionizing radiation, cost effectiveness, real-time display, portability, and accessibility, etc 1 . However, real-time display and diagnosis also render the diagnostic results highly operator dependent with high inter- and intra-observer variability 2 . Advanced automatic US image analysis methods have been investigated intensively to make US-based diagnosis and interventions more objective, accurate, and intelligent 3, 4. US-based radiomics features have been reported with objective, promising results in characterizing breast biology 5 , gestational age 6 , neonatal respiratory morbidity 7 , thyroid tumor, and lymph node status (LNS) of cervical cancer (CC) 8 , etc.
Radiomics is valuable for diagnostic, prognostic, and predictive analysis in the era of precision medicine by extracting high-throughput quantitative features from images of magnetic resonance image (MRI), computed tomography (CT), positron emission tomography (PET), and US, etc 9 . A previous studies utilized radiomics models based on ultrasound (US) for non-invasive prediction of preoperative lymph node metastasis (LNM) in cervical cancer patients. 47 However, studies indicated that radiomics features are susceptible to variations in scanners, acquisition protocols, and reconstruction settings, which is unavoidable in retrospective and multicenter studies in the current clinical practice 10 . The influences of different scanners and automatic segmentation algorithms in US-based radiomics had also been reported 11–13. Therefore, different harmonization solutions were proposed to improve the reproducibility and stability of radiomics features 14 .
The solution for feature harmonization could be categorizes two principal methodologies: the feature domain and the image domain approach. In the feature domain, various methods achieved harmonization by identifying and focusing on reproducible features 15–17. However, this could also result in the neglect of potentially valuable information during feature extraction, and no universally accepted criterion existed for defining high reproducibility. Standardization of acquisition protocols and reconstruction settings were usually considered for the harmonization in the image domain for CT, MRI and PET. US imaging is a unique procedure that relies heavily on sonographers’ knowledge and experience, which renders standardization of acquisition protocols and reconstruction settings impracticable 18 . Deep learning networks, such as convolutional neural networks (CNN) or generative adversarial networks (GAN), had been applied in many studies to harmonize medical images either by image-to-image translations or domain transformations 19, 20. However, studies also indicated that unwanted artifacts may be introduced or the quality of the quantitative information contained in the images may be reduced after style transfer 21, 22. On the other hand, few studies had addressed the harmonization of US images using style transfer. The purpose of this study is to investigate the feasibility and accuracy of an adapted cycle-GAN network in the style transfer for US-based radiomics with images from multiple scanners for patients with ECC.
Materials and Methods
Methodological Overview
Figure 1 illustrates the study design flowchart. A style transfer model based on CycleGAN network was firstly trained using ECC images and adapted to transfer paired US phantom images from one US device to another one; the model was then further trained and tested with clinical US images of ECC by transferring images from four US devices to one specific device; finally, the adapted model was tested with its effects on the accuracy of radiomics study for ECC patients with confirmed LNS.

The flowchart of the study design. Initially, the style transfer model based on the CycleGAN network will be trained using early cervical cancer images and tested with paired ultrasound phantom images from one ultrasound device to one specific device. Finally, the adapted model was tested with its effects on the accuracy of radiomics study for ECC patients with confirmed LNS.
Data and Preprocessing
Phantom Image
Paired US images were acquired with two different US devices of Voluson-E8 (GE Healthcare) and HI VISION Preirus (Hitachi Ltd) using a CIRS multi-purpose multi-tissue ultrasound phantom (Model 040GSE, CIRS Inc., Virginia, USA). US images of three fixed shapes were acquired by an experienced sonographer according to different combination of the grayscale, anechoic stepped cylinder, the horizontal and vertical distance points in the phantom, as shown in the Figure 2a1-a3. The images were filtered according to the displayed structures, and the “original” and “target” images were paired using landmark registration in 3D Slicer (version 5.0.2).

Ultrasound images on phantom, a1-a3) Phantom images acquired by Voluson E8 device; b1-b3) Phantom image acquired by HI VISION Preiris device; c1-c3) Generated images with style transfer model (Voluson E8 to HI VISION Preiris).
Clinical Image
We retrospectively collected the ultrasound images of 1707 patients with early-stage cervical cancer at the author's affiliation from 2012 to 2018. These images were obtained from five different scanners: ATL HDI 5000 (Philips), Voluson-E8 (GE Healthcare), Mylab classC (Esaote), ACUSON S2000 (Siemens), and HI VISION Preirus (HITACHI Ltd). The inclusion criteria were as follows: (i) patients who underwent radical hysterectomy and systematic pelvic lymph node dissection; (ii) postoperative histologically confirmed cervical cancer and lymph node status; and (iii) standard ultrasound examination performed within 2 weeks prior to hysterectomy. Exclusion criteria included: (i) incomplete clinical data or inability to perform statistical analysis; (ii) preoperative chemotherapy or radiotherapy; and (iii) patients with a history of malignancy or combined malignancy. Micrometastatic lymph nodes were not considered in this study due to a lack of relevant examinations. According to our inclusion criteria, a total of 169 cases were included in the final analysis, and all the images were staged according to the FIGO classification (2018) by a US physician with 5 years of experience in tumor marking. The detailed clinical characteristics of these patients can be found in Table 1.
Demographic Statics of Patients in the Radiomics Dataset.
(1) p value is calculated from the univariate association test between subgroups. (2) Fisher's exact test and chi-square test were used for categorized variables. LNM: lymph node metastasis, -: negative, +: positive, SD: standard deviation
Style Transfer Model
The structure of style transfer model in this study is adapted from CycleGAN 20, 23, which contains two generator-discriminator pairs (Gxy, Dx, Gyx, Dy) 24 . As shown in Figure 3, with a X-style image inputting to the generator G1, a Y-style image outputs. Similarly, with a Y-style image inputting to the generator Gyx, a X-style image outputs. D discriminates between the generated image and the target image, while G strives to generate images that are indistinguishable from the target.

The structure of style transfer network contains two generator-discriminator pairs (Gxy, Dx, Gyx, Dy); the top right is a schematic representation of the underlying CycleGAN model framework with two distributions X,Y, generators Gxy, Gyx for mapping X to Y and Y to X, and two discriminators Dx, Dy to discriminate between the transformed image; the left side is the modified generators with the added attention module to the second convolution module, as well a spectral normalization was applied in each convolution module to improve the learning ability of the model.
Self-Attention (SE Model)
In the second and subsequent convolutional layers, we have incorporated the attention mechanism proposed by 25 which allows us to superimpose all feature map channels. By using global mean pooling, we can obtain a value that has a global perceptual field. The resulting weight coefficient vectors correspond to the feature map and reflect the importance of each channel. Then, the weight coefficient vectors are multiplied with feature map reinforcing important features and suppressing unimportant.
Total Loss
CycleGAN's loss function consists of six components, which can be categorized into three major types:
Adversarial loss: this component aims to ensure that the generated images from the source domain resemble the target domain images and vice versa. It encourages the generator to produce realistic images that can fool the discriminator. The source domain loss
Cycle consistency loss: it ensures the sample transformed from one domain to another remains unchanged. This loss term is crucial for preserving content during style transfer. It maintains the integrity of the original sample by penalizing any discrepancies between the input and the reconstructed output after the cycle. The loss of cycle consistency is as follows:
The Total Loss is as follows:
Training Details
In order to speed up convergence, prevent model collapse, and ensure stable model training, spectral normalization (SN) was added to the modules of CIL (Conv, InstanceNorm, Leaky Relu) and CTIR (Conv, Transpose, InstanceNorm, Relu) to satisfy 1-Lipschitz continuity 27, 28.
During the model training process, we observed an imbalance issue between the generator and discriminator, which could lead to model instability or even collapse. To address this problem, we adopted the dual time-scale update proposed by Heusel et al 29 . In simple terms, this approach involves using different learning rates for the generator and discriminator. We set the learning rate for the generator to 0.001 and the learning rate for the discriminator to 0.004. This allows the discriminator to respond promptly and provide feedback to update the generator, resulting in making generated images exhibit more characteristics of the target domain.
In addition, a label smoothing strategy is applied that modifies the features label to 90% of their original values, in order to increase the training difficulty for the discriminator and balancing the generator and discriminator.
Phantom Study
Due to the real-time display and operator-dependent nature of ultrasound imaging, there is a high level of inter- and intra-observer variability, which makes the diagnostic results highly subjective. It is difficult to obtain paired images from the same patient using different scanners, making it challenging to assess the level of feature harmonization achieved by the style transfer model. To address this limitation and mitigate operator variability, Specifying the position and shape of calibration points in the ultrasound phantoms was employed. This allowed us to obtain paired ultrasound images from different scanners. This phantom methodology is also applied by 30–35to evaluate GAN networks.
We considered the calibration points in the images as target regions and extracted radiomics features from them. By calculating and comparing the Pearson correlation coefficients of the calibration point features between the source domain images, generated images, and target domain images, we visualized the level of feature coordination. Additionally, peak signal-to-noise ratio (PSNR) 36 was used for evaluating image quality, structural similarity index (SSIM) 37 for assessing whether there were any changes in image structure, mutual information (MI) 38 for measuring the shared information between two random variables and quantifying the similarity between two images. Fréchet Inception Distance (FID) 39 is a metric commonly used to evaluate the quality and diversity of generated images, particularly in image generation tasks. It compares the distribution of generated images with that of real images in a specific feature space.
Predicting LNM After Harmonization
The validated style transfer models were applied to clinical images based on the report by Yi et al 12 . Four different scanners, namely ATL HDI 5000 (Philips), Voluson-E8 (GE Healthcare), Mylab classC (Esaote), and ACUSON S2000 (Siemens), were used to transfer the images to the HI VISION Preirus (HITACHI Ltd) style of ECC US images. Both CycleGAN and its improved versions (CycleGAN + SN, CycleGAN + SN + SE, and CycleGAN + SN + SE + Mix) were applied to the same set of clinical images. A fast image style transfer network model based on Fourier domain adaptation (FDA) was used for comparison. This network achieved image style transfer by swapping the high and low-dimensional features of two images 40 . For each model, the source domain images and generated images used the same mask and pyradiomics for feature extraction. The optimal features were selected using the Mann-Whitney U test and Least Absolute Shrinkage and Selection Operator (LASSO), and elastic net parameters were adjusted using ten-fold cross-validation ridge regression to avoid overfitting 41 . The λ coefficient was adjusted to maximize the area under the receiver operating characteristic (ROC) curve (AUC). Based on the linear combination of selected features, radiomics scores were calculated, and predictive models were established before and after harmonization.
Statistical Analysis
PSNR, SSIM, FID and Pearson's correlation coefficient were calculated in Python (version 3.7.0). Statistical analyses were performed in R analysis platform (version 3.6.0) and OriginPro 2016. Selection of key features and logistic regression model building were done using the “glmnet” package. For continuous clinical variables, a two-sample t test was used. For categorical variables, Fisher's exact test and chi-square test were used t. For all tests, p < 0.05 was considered as statically significant.
Results
Phantom Data Results
As shown in Figure 4, the performance of the final model is compared with FDA, CycleGAN, and its improved models using ultrasound phantom data. Before and after harmonization, Figure 4a provides a comparison of image quality, structure, and feature domain between the original images and generated images with the PSNR, SSIM, FID, and MI for the original and generated images of 11.18 ± 0.69 versus 15.45 ± 0.55, 0.16 ± 0.006 versus 0.17 ± 0.006, 224.48 versus 206.18, and 6.85 ± 0.11 versus 4.78 ± 0.07, respectively. Figure 4b presented the Pearson correlation coefficients and heat maps between radiomics features extracted from the original and transferred images, in correlation with those from target images. The average correlation coefficients were 0.60 (95% confidence interval (CI), 0.53–0.65) and 0.81 (95% CI, 0.77-0.86) for original and generated images in correlation with the target images, respectively.

Evaluation of image quality, structural similarity and radiomics reproducibility using US phantom images after style transfer: (a) a comparison of image quality and structure metrics between original and generated images; (b) Pearson correlation coefficients and heat maps depicting the radiomics features extracted from the target of original and generated images.
Clinical Data Results
The patient data in the final analysis was divided into a training set (n = 118) and a validation set (n = 51) in a 7:3 ratio. The training set and validation set consisted of 81 and 37 cases of confirmed LNM, and 39 and 12 cases of confirmed non-LNM, respectively. All style transfer models were tested, and the generated images and original images were evaluated using the same evaluation metrics as in the phantom study. Due to the unavailability of paired images from different scanners, only the MI and FID metrics were used for evaluation. A typical vision evaluation of the original and transferred images was shown in Figure 5. The image quality of the transferred images in comparison with their original ones were presented in Table 2 with model CycleGAN + SN + SE + Mix achieved a best FID and MI of 26.11 and 4.78 ± 0.07, respectively.

Original images from different ultrasound devices and their transferred images, a1-4) original images from SIMENS_SONLINE, MYCLASS, GE-Volusion E8 and HDI5000; b1-4) corresponding generated images in the style of radiomics features of HITACHI-Preirus.
Evaluate the Performance of the Generated Images to Display the Style Conversion Model by Using our Proposed Evaluation Metrics.
A total of 4, 15, 9, 9, and 5 radiomics features were screened out of 451 features from Model FDA, CycleGAN, CycleGAN + SN, CycleGAN + SN + SE, and CycleGAN + SN + SE + Mix, respectively, after the Mann–Whitney U test and the LASSO analysis. The detailed features and their corresponding coefficients, as well as the radiomics score calculation, were shown in supplemental material S1 file 2. The performance of LNM prediction radiomics models with features extracted from generated images of different style transfer models was shown in Figure 6a, with an AUC ranged from 0.73 (95% CI: 0.57-0.89) to 0.85 (95% CI: 0.74-0.96). Figure 6b showed the comparison of predictive accuracy for LNM in patients with ECC, presented using radiomics models constructed with both original images and style-transferred images. The results revealed an AUC of 0.78 (95% CI: 0.64-0.93) for original images and an AUC of 0.85 (95% CI: 0.74-0.96) for style-transferred images. Detailed performance and comparison among these models were presented in Table 3.

Radiomics performance with transferred ultrasound images in the prediction of lymph node metastasis for patients with cervical cancer; (a) radiomics performance with generated images with different style transfer models; (b) radiomics performance comparison with original ultrasound images and transferred images.
Comparison of Prediction Model Performance Between Training Cohort and Validation Cohort.
Notes: AUC: area under curve, ACC: accuracy, SPE: specificity, SEN: sensitivity, SN: spectral normalization, SE: self-attention, Mix: hybrid loss function, FDA: Fourier domain adaptation, CycleGAN: Cycle Generative Adversarial Network
Discussion
In this study, a CycleGAN-based style transfer network was adapted and trained to transfer US images from different devices to one specific device to improve image quality and homogeneity, so as to reduce the impact of different devices on radiomics for ECC patients. Radiomics model with generated US images achieved an AUC of 0.85 compared with 0.78 with original images in preoperative LNM prediction.
The phantom results demonstrated that the generated images after style transfer showed a better image quality and remained stable structures with a better SSIM and a higher PSNR. Pearson correlation analysis demonstrated that radiomics features extracted from the generated images are highly correlated with target images indicating a higher reproducibility in comparison with original images. The comparisons with recent research work on image harmonization are summarized in Table 4. The AUC of radiomics model with radiomics features extracted from US images acquired from multiple devices were 0.78 (95% CI, 0.64-0.93), which was higher than the reported AUC results of 0.66 (95% CI, 0.59-0.73) in the training cohort and higher than 0.61 (95% CI, 0.50-0.72) in the validation cohort in the study of Yi et al 12 .With proposed style transfer model, the radiomics model improved the AUC for LNM prediction to 0.85 (95% CI, 0.74-0.96) after transferring US images from other four devices to those of HI VISION Preirus (HITACHI Ltd).This was higher than the reported best achievable of AUC of 0.80 ± 0.17 with images from individual US device.
Recently Research of US Harmonization and Cervical Cancer Lymph Node Metastasis.
Previous studies had shown that radiomics methods can non-invasively predict the preoperative lymph node status in cervical cancer patients 44–47. However, these studies did not further investigate the reproducibility and stability of radiomics features. While Haberl et al 20 considered the impact of different centers on PET radiomics, they did not analyze the influence of different scanners, acquisition protocols, and reconstruction settings on radiomics features. Yi et al demonstrated that the discriminative accuracy of US based radiomics could be of 17.8% difference with features extracted from different US devices 12 . One way to increase the producibility of radiomics features is to harmonize their statistical properties by normalization or batch-effect correction using the ComBat method 48, 49. In the image domain, methods of standardization of image acquisition, post-processing of raw sensor-level image data, data augmentation techniques, and style transfer were usually applied to harmonize radiomics features 50 . GAN and neural style transfer (NST) techniques, or a combination of both, had been investigated intensively with CT, MRI and PET images to address the variability across multi-centric radiomic studies 10 . Previously, Liu et al proposed a novel and general style transfer framework to remove the appearance shifts of US images to improve US image segmentation 51 . Another study on CycleGAN shows that Pseudo anatomical images generated on breast US provides a more intuitive display, enhances tissue anatomy, and preserves tumor geometry; and can potentially improve diagnoses and clinical outcomes 42 . Similarly, research on cardiac ultrasound has also demonstrated that CycleGAN has great improvement on image quality and image feature harmonization 43 . However, the review also mentions that DL is a ‘black box’ approach, the lack of interpretability of the models and the deep features generated are seen as a key limitation in clinical applications 52 .
The combination of feature and image domain harmonization for US images based radiomics study is of great clinical value in the future studies. In this study, only US images of cervical cancer were tested for LNM prediction. US images for other cancer types are also potential in future studies. Our model can also be used in other diseases other than cervical cancer. Images from multiple centers were also needed to further validate the accuracy of the proposed style transfer models. In addition, style transfer networks can be applied not only in ultrasound but also in MRI and CT, etc As there are differences between devices and different scanning sequence parameters, this method can also be used for radiomics normalization.
Conclusion
The adapted CycleGAN network was feasible and accurate to convert US images acquired from different US devices into images of one specific device to improve the image quality and the performance of radiomics studies.
Supplemental Material
sj-docx-1-tct-10.1177_15330338241302237 - Supplemental material for Radiomics Harmonization in Ultrasound Images for Cervical Cancer Lymph Node Metastasis Prediction Using Cycle-GAN
Supplemental material, sj-docx-1-tct-10.1177_15330338241302237 for Radiomics Harmonization in Ultrasound Images for Cervical Cancer Lymph Node Metastasis Prediction Using Cycle-GAN by Zeshuo Zhao, Yuning Qin, Kai Shao, Yapeng Liu, Yangyang Zhang, Heng Li, Wenlong Li, Jiayi Xu, Jicheng Zhang, Boda Ning, Xianwen Yu, Xiance Jin and Juebin Jin in Technology in Cancer Research & Treatment
Footnotes
Abbreviations
Author's Contributions
Jue-bin Jin, Xian-ce Jin, Ze-shuo Zhao, and Ji-cheng Zhang made substantial contributions to the conception of the study. Jue-bin Jin, Xian-ce Jin, Ze-shuo Zhao, Ji-cheng Zhang, and Xian-wen Yu critically revised the manuscript for important intellectual content. Jue-bin Jin, Xian-ce Jin, and Ze-shuo Zhao agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. Jue-bin Jin, Xian-ce Jin, Ze-shuo Zhao, Ji-cheng Zhang, Xian-wen Yu, Kai Shao, Yang-yang Zhang, Ya-peng Liu, Heng Li, and Wen-long Li analyzed the data. Jue-bin Jin, Xian-ce Jin, Ze-shuo Zhao, Ji-cheng Zhang, Xian-wen Yu, Kai Shao, Yang-yang Zhang, Ya-peng Liu, and Bo-da Ning interpreted the data. Ya-peng Liu, Yang-yang Zhang, and Kai Shao have the checking, and verification review of clinical data. Ji-cheng Zhang, Heng Li, Wen-long Li, Yu-ning Qin, and Jia-yi Xu acquired the data. Ze-shuo Zhao drafted the manuscript.
Data Availability Statement
Data is available upon request to the corresponding author.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethics Statement
The study was conducted in accordance with the Declaration of Helsinki and approved by the Research Ethics Committee (ECCR no.2019059) of the author's hospital. Written informed consent was waived for this retrospective study to maintain patient data confidentiality.
Funding
This research was supported partially by a key project of Zhejiang Natural Science Foundation [Z24A050009], a Key project of Zhejiang Provincial Health Science and Technology Program [WKJ-ZJ-2437], a Major project of Wenzhou Science and Technology Bureau [ZY2022016, ZY2020011], a project of Wenzhou Science and Technology Bureau (Y2023798), Zhejiang Engineering Research Center for innovation and application of Intelligent Radiotherapy Technology, Zhejiang-Hong Kong Precision Theranostics of Thoracic Tumors Joint Laboratory, and Wenzhou key Laboratory of basic science and translational research of radiation oncology.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
