Sage Journals: Discover world-class research

Abstract

MedSegLLM proposes a cross-modal medical image segmentation framework based on Large Language Model (LLM), which aims to solve the core challenges of high annotation cost, modality heterogeneity (such as significant differences between CT and MRI features), and insufficient cross-device generalization capabilities. The framework achieves a breakthrough through three innovative modules: (1) the cross-modal semantic alignment network uses contrastive learning to map image features and medical text descriptions generated by LLM to the same semantic space, combines the weighted loss function and KL divergence constraints (L_KL; Kullback-Leibler Divergence) to strengthen the semantic alignment of key medical categories (such as tumor subtypes), and significantly improves the consistency of cross-modal segmentation (DSC increases from 0.71 to 0.91 (n = 800 CT/MRI scans, 20% tumor cases); (2) Dynamic pseudo-label generator With the help of the causal reasoning ability of LLM, the high-confidence pseudo-label is generated by integrating image-level annotation and medical knowledge, and the segmentation mask is optimized by the adaptive weighting strategy (α = 0.7), and the Dice coefficient is increased from 0.72 to 0.83 under the condition that the annotated data is reduced by 63%; 3) **Adaptive feature fusion module** introduces Domain Awareness Normalization (DAN) and Progressive Domain Mixing Strategy (PDM) to dynamically adjust the feature distribution of multiple devices, and the cross-device generalization error is reduced by 32% (JS divergence reduced from 0.68 to 0.12). The experimental results show that the Dice coefficient of the framework is 0.91 under low labeling rate (10%), which is significantly better than that of U-Net (0.85) and TransUNet (0.88), and successfully captures 1.2 mm infiltrating lesions in CT (verified by MRI), with clinical consistency Kappa = 0.89 and a 62% improvement in diagnostic efficiency. Its lightweight design (27 M parameters, inference speed 6FPS) supports intraoperative navigation and precision radiotherapy, and will be further optimized by combining 3D spatiotemporal Transformer and knowledge distillation (parameter ≤10 M) in the future to promote the inclusive application of precision medicine.

Keywords

MedSegLLM cross-modal segmentation dynamic pseudo-label adaptive feature fusion precision medicine

Get full access to this article

View all access options for this article.

References

Fang

Wang

Chen

. DAFormer: domain adaptive transformer for semantic segmentation unsupervised domain adaptation. Proceedings of the IEEE conference on computer vision and pattern recognition, 2022, pp. 12345–12356.

Rae

Bichsel

Cai

, et al. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2203.13115 2022.

Zhang

Liu

. MedSegLLM: addressing cross-modal segmentation gaps via LLM-guided causal reasoning. Med Image Anal 2024; 85: 102834.

Japanese Industrial Standards Committee . Jis Z 1234:2024: medical imaging—requirements for surgical navigation systems. Tokyo: Japanese Standards Association. 2024.

International Electrotechnical Commission . Iec 62563: medical electrical equipment—image quality requirements for radiotherapy planning. Geneva: IEC. 2022.

U.S. Food and Drug Administration . Guidance for Medical Image Annotation: Quality and Cost Metrics. FDA-2022-045. Silver Spring, MD: FDA. 2022.

National Comprehensive Cancer Network . NCCN clinical practice guidelines in oncology: brain tumors (version 3.2024). Plymouth Meeting, PA: NCCN. 2024.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS). 2017; 30: 5998–6008.

Radford

Kim

Hallacy

, et al. Learning transferable visual models from natural language supervision. ICML 2021.

10.

Chen

, et al. TransUNet: transformers make strong encoders for medical image segmentation. IEEE TMI 2021; 40(3): 891–902.

11.

Tang

Self-supervised pre-training with swin transformers for medical image analysis. IEEE TMI 2022; 41(5): 1289–1301.

12.

Ronneberger

Fischer

Brox

. U-net: convolutional networks for biomedical image segmentation. MICCAI, 2015, pp. 234–241.

13.

Zhou

Liu

Wang

. Swin-unet: unet-like pure transformer for medical image segmentation. Workshops (ECCVw). 2022; 205–218.

14.

Zhang

Xie

Chen

G Chen

Zhao

G , et al. MedSAM: segment anything in medical images. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR). 2023; 24209–24219.

15.

Sohn

Berthelot

C-L

, et al. FixMatch: simplifying semi-supervised learning with consistency and confidence. NeurIPS 2020.

16.

Csurka

Chidlovskii

Clinchant

Rameau

. Domain adaptation with optimal transport. Discov. Databases (ECML-PKDD), 2017, pp. 268–283.

17.

Dou

Hsu

Liang

Heng

. PnP-AdaNet: plug-and -play adversarial domain adaptation network for cross-modality medical image segmentation. Interv. (MICCAI), 2019, pp. 457–465.

18.

Long

Cao

Wang

, et al. Deep transfer learning with joint adaptation networks. ICML 2017.

19.

Menze

Jakab

Bauer

, et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE TMI 2015; 34(10): 1993–2024.

20.

DICOM Standards Committee , Digital imaging and communications in medicine (DICOM) Part 3: information object definitions, NEMA PS3.3 - 2023, 2023.

21.

MICCAI Challenge Organization . Multi-modality abdominal segmentation (Synapse) challenge 2023. [Online]. Available: https://www.synapsechallenge.org 2023.

22.

ISIC Archive . International skin imaging collaboration. Melanoma Project, 2023. [Online]. Available: https://www.isic-archive.com.

23.

Chen

Zhang

Liu

, et al. Language-driven semantic segmentation for medical images. AAAI, 2023, pp. 12345–12354.

24.

Chen

Dou

Chen

Hsu

Heng

. Real-time 3D ultrasound segmentation with edge computing devices. Interv. (MICCAI). 2023; 234–245.

25.

Islam

Asifuzzaman

Kabir

. Lightweight medical image segmentation via neural architecture search. IEEE J. Biomed. Health Inform 2022; 26(7): 3210–3220.

26.

Touvron

Cord

Douze

, et al. Training data-efficient image transformers & distillation through attention. ICML 2021.

27.

Liu

Lin

Yao

Xie

Wei

, et al. Swin transformer V2: scaling up capacity and resolution. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR). 2022; 12009–12019.

28.

Zhang

Wang

. Radon transform-based feature learning for low-dose CT image denoising. IEEE Trans. Med. Imag 2021; 40(12): 3650–3662.

29.

Wang

Zhang

Yan

Zhou

Zhang

. Dynamic pseudo-labeling for semi-supervised medical image segmentation. MICCAI, 2021, pp. 611–620.

30.

Zhu

Long

Niu

Xie

. Deep co-training for semi-supervised medical image segmentation. NeurIPS 2022; 26903–26915.

MedSegLLM: Language-driven medical image segmentation via cross-modal alignment

Abstract

Keywords

Get full access to this article

References