Abstract
BACKGROUND:
Plane-wave imaging is widely employed in medical imaging due to its ultra-fast imaging speed. However, the image quality is compromised. Existing techniques to enhance image quality tend to sacrifice the imaging frame rate.
OBJECTIVE:
The study aims to reconstruct high-quality plane-wave images while maintaining the imaging frame rate.
METHODS:
The proposed method utilizes a U-Net-based generator incorporating a multi-scale convolution module in the encoder to extract information at different levels. Additionally, a Dynamic Criss-Cross Attention (DCCA) mechanism is proposed in the decoder of the U-Net-based generator to extract both local and global features of plane-wave images while avoiding interference caused by irrelevant regions.
RESULTS:
In the reconstruction of point targets, the experimental images achieved a reduction in Full Width at Half Maximum (FWHM) of 0.0499 mm, compared to the Coherent Plane-Wave Compounding (CPWC) method using 75-beam plane waves. For the reconstruction of cyst targets, the simulated image achieved a 3.78% improvement in Contrast Ratio (CR) compared to CPWC.
CONCLUSIONS:
The proposed model effectively addresses the issue of unclear lesion sites in plane-wave images.
Introduction
Traditional ultrasound imaging presents limitations regarding frame frequency. Consequently, the advent of plane-wave techniques, boasting superior imaging frame frequencies, has gained significant traction within the medical field. However, These techniques compromise image quality. Beam synthesis methods [1, 2, 3] offer a moderate enhancement in image quality. Nevertheless, the compounding process reduces the frame rate. With the emergence of deep learning in computer vision [4, 5], researchers have started exploring the adaptive learning capability of deep neural networks to obtain a nonlinear mapping relationship between low-resolution and high-resolution images, thereby enhancing the clarity of medical images.
Dong et al. first introduced CNNs for image super-resolution reconstruction, achieving superior results compared to interpolation-based methods. Attention mechanisms are widely used to mitigate the overemphasis on low-value feature information. Wei et al. proposed the CDC model, incorporating divide-and-conquer attention to achieve sub-regional image reconstruction [6]. Chen et al. [7] proposed a multi-attention augmented network, which stacks attention augmentation modules into a deep residual architecture to utilize the complementary information from multiple representation stages fully. Zhou et al. [8] proposed an Enhanced Generative Adversarial Network to enhance high-frequency information while obtaining multi-scale features. Wang et al. incorporated a higher-order degradation modeling process in image reconstruction, effectively enhancing the texture details of the images [9]. Compared to beam synthesis methods, reconstruction models based on neural networks produce visually more appealing images and are closer to practical application requirements. However, these methods often encounter inconsistent training results and slow convergence speeds.
This paper proposes an end-to-end plane-wave image reconstruction method. Specifically, the main contributions are as follows:
We propose the MD-GAN model for plane-wave image reconstruction. The model employs a U-Net-based generator and a Patch discriminator. It incorporates a multi-scale convolution within the U-Net encoder to capture information from the input image at different frequencies. We propose Dynamic Criss-Cross Attention (DCCA) in the U-Net decoder to avoid interference caused by irrelevant regions in the single-beam plane wave image during reconstruction. Moreover, DCCA enhances the texture details and general information of the image. The MD-GAN incorporates the adversarial loss and integrates the L1-Structural Similarity index (L1-SSIM) loss and perceptual loss in the generator’s loss function. We conducted extensive experiments on the PICMUS 2016 dataset [10].
The single-beam plane-wave image exhibits significant scattering effects, blurred texture, and masked regions within the image. We propose the MD-GAN model to address the issues. The overall architecture of the model is depicted in Fig. 1. A U-Net-based generator and patch discriminator are used, with a single-beam plane-wave image as the input to the generator and 75 different beam plane-wave synthesized images as the actual labels. The generator generates the corresponding super-resolution images based on the input images, and the discriminator compares the generated images with the actual labels to discriminate the true from the false.
Multi-scale convolution module
Inspired by the multidimensional feature extraction in the literature [11], this paper designs a multiscale module to capture the scattering features in each dimension through convolutional operations at different scales to alleviate scattering interference on image reconstruction. The structure of the multiscale convolution module is depicted in Fig. 2, and Table 1 presents the main parameters of each layer.
Main parameters of multi-scale convolution module
Main parameters of multi-scale convolution module
Schematic diagram of the overall structure of the modeU-Net Generator.
Schematic diagram of multi-scale convolution module structure.
In image reconstruction, While the widely used self-attention mechanism comprehensively captures an image’s global structure, it fails to extract local image features. Referring to the enhancement of local features in reference [12], this study proposes the DCCA mechanism, which introduces a sliding window on top of Criss-Cross Self-Attention, and the introduction of the sliding window not only enhances the local features but also avoids the occlusion region in the single-beam plane-wave image from interfering with the reconstruction of the target region. DCCA establishes the connection between each pixel point in the sliding window by performing two cross operations inside the sliding window. By adjusting the sliding window’s size and step length, the proportion of global and local features extracted by DCCA can be fine-tuned. The larger the sliding window step size and the smaller the size, the higher the weight of local feature extraction, and vice versa, the smaller the step size and the larger the size, the higher the weight of global feature extraction. The implementation process of DCCA is depicted in Fig. 3.
Schematic diagram of the Dynamic Criss-Cross Attention structure.
In the first step, a weight matrix is obtained. Three
In the second step, weights are applied to the input feature map. The weight vector
In the third step, the first and second steps are repeated. Ultimately, the feature region
The deep layer of the U-Net decoder utilizes DCCA with a relatively small sliding window size to focus on enhancing texture information. In contrast, the shallow layer employs DCCA with a relatively sizeable sliding window size to focus on the fusion of general information. The specific parameters of DCCA in the U-Net decoder are shown in Table 2.
Main parameters of DCCA in decoder
This study introduces L1-SSIM loss and perceptual loss on top of the original adversarial loss. The combination of L1 loss and MS-SSIM loss (L1-SSIM loss) considers the generated image’s gray value and structural information. The perceptual loss function encourages the generated and actual images to contain similar high-level semantic information.
Experimental results and analysis
Experimental details
For this study, we utilized the dataset from the 2016 Plane-Wave Imaging Challenge in Medical Ultrasound (PICMUS), comprising 360 image sets. The dataset included 120 sets of simulated images generated using Field II, 120 sets of experimental images obtained through the Verasonics Vantage 256 study scanner and L11 probe, and 120 sets of human carotid images. Image flipping operations were performed to augment the dataset and enhance its diversity, resulting in 1350 image sets. Subsequently, the dataset was divided into training and test sets in an 8:2 ratio.
Different evaluation measures were selected to assess the reconstruction performance for each type of image. The Full Width at Half Maximum (FWHM) and Contrast Ratio (CR) were utilized to evaluate the reconstruction of point and cyst target images, respectively. Furthermore, the PSNR and SSIM were employed to evaluate the reconstruction of actual images
Comparative experiments
Results of deep learning methods
As depicted in Fig. 4, the MD-GAN model achieves the optimal visual outcome across various targets. This model effectively circumvents evident reconstruction traces in the masked areas of images, a predicament commonly associated with SRGAN and ResGAN models. Additionally, it successfully addresses the DCGAN model’s limitation in noise removal from single-beam plane wave images. It alleviates the issue of enhanced blurriness in texture details, as typically observed in images generated by the HRGAN model. Furthermore, it resolves the UnetGAN model’s challenge in accurately reconstructing the background region within images.
A comparison of results with those of deep learning methods
A comparison of results with those of deep learning methods
Table 3 illustrates that compared to the ResGAN model, MD-GAN achieves reductions of 0.0072 mm and 0.0016 mm in FWHM for simulated and experimental point targets, respectively. Furthermore, the contrast of simulated and experimental cyst targets generated by MD-GAN reaches 36.8092 dB and 25.9520 dB, surpassing most prevalent deep-learning reconstruction methods. In the context of
A comparison of results with those of traditional methods
As shown in Table 4, the FWHM of the simulated point target was reduced by 0.0076 mm compared to the CF method. The simulated and experimental cyst target’s CR was improved by 3.48% and 1.71% compared to the MV method. The proposed model maintained a higher PSNR value for the generated images in the reconstruction of in vivo actual images, which was significantly better than the MV method.
Model scale comparison
As shown in Table 5, MD-GAN has a certain advantage in terms of the model scale, and compared with the SRGAN model, it is less computationally intensive and more time efficient despite the more significant number of parameters.
A comparison of model scale
A comparison of model scale
Comparison plots of experimental results of deep learning methods.
Ablation experiment of the generator part
The effectiveness of the U-Net network improvement in this model was evaluated through ablation experiments in the generator section, as presented in Fig. 5. The scatter in the generated images is significantly reduced by incorporating the multi-scale convolution module. Moreover, the addition of DCCA resulted in significantly enhanced simulated point targets. The color of the obtained cyst target was also deepened by adding DCCA to the reconstruction of the cyst target. Furthermore, in reconstructing the actual image in vivo, the addition of DCCA was able to reconstruct the carotid artery image with a more complete overall structure.
Results of generator ablation experiments
Results of generator ablation experiments
Plot of the ablation experiment’s results for the generator part.
As presented in Table 6, In the case of point targets, incorporating DCCA resulted in a reduction of 0.0092 mm in the FWHM of simulated point targets. Cyst targets, including DCCA, yielded a significant enhancement in CR for both simulated and experimental cyst targets, with improvements of 25.02% and 10.54%, respectively. In the reconstruction of real in vivo images, utilizing the multi-scale convolution module led to an increase in PSNR of 3.3980 dB and an improvement in SSIM of 0.0464 for the carotid reconstruction images.
Figure 6 illustrates the experimental results. Only the original adversarial loss function was retained, resulting in a generated image that failed to restore the overall structure resembling the actual image. On this basis, introducing perceptual loss still did not lead to the reconstruction of the overall structure of the image. However, combining the adversarial loss with L1-SSIM loss proved effective in reconstructing the image’s global information and texture details. Moreover, the combination of perceptual, adversarial, and L1-SSIM loss yielded plane-wave ultrasound images with a more apparent texture and higher contrast.
Results of loss function ablation experiments
Results of loss function ablation experiments
Table 7 demonstrates that solely employing the traditional adversarial loss leads to inferior metrics for point target images, cyst target images, and in vivo actual images. By introducing perceptual loss on top of this, most of the experimental metrics will be reduced. Integrating L1-SSIM with the adversarial loss yields improved experimental metrics across all image types. In this model, retaining L1-SSIM, perceptual loss, and adversarial loss further enhances the experimental indexes.
In this study, ablation experiments were conducted to investigate the impact of sliding windows of DCCA in the U-Net decoder. Specifically, as depicted in Fig. 7, the reconstructed simulated point targets appeared noticeably weaker when the 7
Results of the DCCA’s sliding window ablation experiments
Results of the DCCA’s sliding window ablation experiments
Plot of the ablation experiment’s results for the loss function part.
Plots of experimental results for ablation of sliding windows in DCCA.
Table 8 presents the effects of sliding windows in DCCA on various target reconstructions. For point target reconstruction, including a 3
This study proposes the MD-GAN image reconstruction method, which utilizes the U-Net generator and Patch discriminator. The U-Net generator adds the multi-scale convolution module and DCCA to separate the image’s high and low-frequency information while enhancing the image’s local and global features and suppressing the interference of irrelevant regions to the reconstructed target. Moreover, the loss function of this model is enriched with perceptual loss and L1-SSIM loss to enhance further the quality of the images generated.
Contribution of modules
In medical image reconstruction, the use of self-attention, although better at establishing the overall structure of the image, can be influenced by the occluded areas of the single-beam plane-wave image in the process of extracting global features, resulting in the overall darkness of the reconstructed experimental and
The addition of a multi-scale convolution module to the encoder of the U-Net network aids in separating the high and low-frequency information from the input image through the use of convolutional kernels of varying sizes. This separation facilitates the determination of the boundary of the simulated point target and eliminates scatter around the simulated point target. The multi-scale convolution module, which combines DCCA in the decoder, can more effectively utilize the separated high and low-frequency information, reducing the FWHM of the point target image and an improvement in the PSNR of the carotid artery image.
In this study, the loss function of the MD-GAN model introduces L1-SSIM loss and perceptual loss on top of the original adversarial loss function. The L1-SSIM loss can guarantee the grayscale value of the generated image while building a clear image structure, ensuring that the generated image maintains high structural similarity with the actual image. Additionally, the perceptual loss encourages the generated image to contain similar high-level semantic information as the actual image. The original adversarial loss alone is insufficient to establish the actual structure of the image. However, the combination of adversarial loss, L1-SSIM loss, and perceptual loss leads the model to converge in the optimal direction, resulting in the generation of plane-wave ultrasound images with complete structure and precise details.
Model quality
The MD-GAN model exhibits excellent performance in point target images, cyst target images, and in vivo carotid artery images, and it surpasses several other evaluation models in terms of visual effects and various evaluation metrics. However, it is essential to note that deep learning-based reconstruction models achieve image-to-image reconstruction by learning the nonlinear relationship between low-resolution and high-resolution images without fully utilizing beam information. Consequently, the FWHM of the generated experimental point target images and the PSNR of carotid artery images still do not outperform the performance of beam-based reconstruction methods.
Deep learning-based methods demonstrate substantial fluctuations in overall reconstruction results in point target images. Specifically, SRGAN yields noticeably poor reconstruction results. DCGAN and HRGAN models generate the point target image with significant scatter and unsatisfactory performance across all experimental indexes. While ResGAN manages to reduce the FWHM of both simulated and experimental point target images, evident reconstruction traces are present in the occluded areas of the images. The UNetGAN model exhibits a limited ability to extract low-frequency information and fails to accurately reconstruct the background area of the experimental point target image. Introducing an attention mechanism in AUGAN enhances the model’s ability to extract high-frequency features and performs better in reconstructing simulated point target images. However, the model has limited capability for extracting global features, resulting in poor reconstruction quality in the masked region of the experimental point target image. By integrating DCCA with a relatively sizeable sliding window size into the shallow layer of the decoder, MD-GAN effectively enhances the low-frequency information extracted from the multi-scale convolution module, thereby achieving high-quality reconstruction of occluded and background regions. The utilization of the multi-scale convolution module separates the high and low-frequency information, resulting in accurate reconstruction of point target boundaries and effectively eliminating scattered spots surrounding the point target. Furthermore, incorporating DCCA with a relatively small sliding window size in the deep layer of the decoder enhances local features, resulting in a more prominent point target. Consequently, the model successfully reduces the FWHM of the generated image while producing reconstructed images with clear targets and complete structures.
Regarding reconstructing cyst target images, the SRGAN model yields terrible visual effects and image metrics. DCGAN and HRGAN generate images with low CR, and there are severe noise and image blurring problems, respectively. While ResGAN manages to improve the CR of cyst images, it fails to recover obscured parts of the images. The introduction of attention in the AUGAN model significantly deepens the cyst target in the reconstructed image and improves the CR of the simulated cyst target. However, the model exhibits a poor ability to acquire low-frequency information, resulting in overall darkness in the generated cyst images and slight reconstruction traces. The UNetGAN model struggles to accurately extract global features of the image, leading to too bright and too dark areas in the background of the generated image. Although high CR is achieved in reconstructing the experimental cyst target, the model needs to improve the quality of the generated images. The MD-GAN model adds DCCA with increasing sliding window size sequentially from deep to shallow layers of the U-Net decoder. This approach increases the weight of local feature extraction before increasing the weight of global information extraction, ensuring that the generated image possesses a more precise overall structure while highlighting the cyst target and improving the CR of the cyst target image.
The MD-GAN model outperforms the other evaluated models in reconstructing in vivo carotid artery images, primarily due to the rest of other evaluation models’ inability to effectively extract low-frequency features and accurately reconstruct the global structure. MD-GAN combines a multi-scale convolution module with DCCA to capture low-frequency information precisely and introduces the L1-SSIM loss function to positively guide the structural similarity between the generated and actual images. Nevertheless, compared to traditional beam synthesis methods like CF and GCF, the reconstruction effect of MD-GAN in vivo carotid artery images still has room for improvement.
Conclusion
In this paper, we have proposed a novel end-to-end plane-wave image reconstruction network called MD-GAN. This network utilizes a U-Net-based generator and Patch discriminator, incorporating a multi-scale convolution module in the encoder of the U-Net generator to extract information of different frequencies. Additionally, DCCA is introduced in the decoder of the U-Net generator. It enhances local features, integrates global information, and suppresses the interference of unrelated regions with the reconstructed target.
Experimental results demonstrate the effectiveness of our proposed method. However, deep learning methods face challenges in effectively utilizing acoustic emission and echo information during image reconstruction. Addressing this limitation will be a crucial focus of our future research.
Footnotes
Conflict of interest
None to report.
