Abstract
MedSegLLM proposes a cross-modal medical image segmentation framework based on Large Language Model (LLM), which aims to solve the core challenges of high annotation cost, modality heterogeneity (such as significant differences between CT and MRI features), and insufficient cross-device generalization capabilities. The framework achieves a breakthrough through three innovative modules: (1) the cross-modal semantic alignment network uses contrastive learning to map image features and medical text descriptions generated by LLM to the same semantic space, combines the weighted loss function and KL divergence constraints (LKL; Kullback-Leibler Divergence) to strengthen the semantic alignment of key medical categories (such as tumor subtypes), and significantly improves the consistency of cross-modal segmentation (DSC increases from 0.71 to 0.91 (n = 800 CT/MRI scans, 20% tumor cases); (2) Dynamic pseudo-label generator With the help of the causal reasoning ability of LLM, the high-confidence pseudo-label is generated by integrating image-level annotation and medical knowledge, and the segmentation mask is optimized by the adaptive weighting strategy (α = 0.7), and the Dice coefficient is increased from 0.72 to 0.83 under the condition that the annotated data is reduced by 63%; 3) **Adaptive feature fusion module** introduces Domain Awareness Normalization (DAN) and Progressive Domain Mixing Strategy (PDM) to dynamically adjust the feature distribution of multiple devices, and the cross-device generalization error is reduced by 32% (JS divergence reduced from 0.68 to 0.12). The experimental results show that the Dice coefficient of the framework is 0.91 under low labeling rate (10%), which is significantly better than that of U-Net (0.85) and TransUNet (0.88), and successfully captures 1.2 mm infiltrating lesions in CT (verified by MRI), with clinical consistency Kappa = 0.89 and a 62% improvement in diagnostic efficiency. Its lightweight design (27 M parameters, inference speed 6FPS) supports intraoperative navigation and precision radiotherapy, and will be further optimized by combining 3D spatiotemporal Transformer and knowledge distillation (parameter ≤10 M) in the future to promote the inclusive application of precision medicine.
Keywords
Get full access to this article
View all access options for this article.
