Sage Journals: Discover world-class research

Abstract

Purpose

A lightweight deep learning network SMA-Net was proposed to intelligently segment the skeletal muscle of the third lumbar (L3) level in patients with cervical cancer radiotherapy, and the segmentation performance of the network was evaluated.

Methods and Materials

A total of 160 eligible patients with cervical cancer admitted to the oncology department of our hospital from September 2021 to June 2024 were randomly divided into training set (N = 112), validation set (N = 16) and test set (N = 32) according to 7 : 1 : 2. The lightweight Mamba architecture is introduced into the UNet network, and the SAB and CAB attention mechanisms are introduced on the skip connection. The attention mechanism is used to suppress the irrelevant information in the image and highlight the important local features. The trained network is geometrically evaluated on the test set for segmentation performance, comparison of manual segmentation and predicted skeletal muscle area (SMA). Compare the parameters and computations of SMA-Net with existing networks.

Results

The dice similarity coefficient of SMA-Net network for skeletal muscle segmentation was 89.16%, the sensitivity SEN was 88.21%, the positive predictive value PPV was 90.13% and the 95% Hausdorff distance was 5.30mm. Manual segmentation is basically close to the predicted SMA. Our proposed network for cervical cancer patients predicted sarcopenia with 87.5% accuracy, 92.31% precision, 80% recall, 85.72% F1-Score, and 0.871 AUC. The calculation amount of SMA-UNet network is 1.50 GFLOPS, and the parameter amount is 1.24 M. The radiologist’s scores show that minor and no revision accounted for 93.75% on manual revision of skeletal muscle.

Conclusion

The lightweight SMA-Net proposed in this study can accurately segment L3 skeletal muscle and quickly calculate its area, which basically meets the clinical application and is convenient for clinical deployment. It is helpful for clinicians to quickly diagnose sarcopenia in patients with cervical cancer, save medical resources, reduce the workload of physicians, and improve diagnostic efficiency.

Keywords

cervical cancer deep learning intelligent segmentation skeletal muscle

Introduction

Cervical cancer is a malignant tumor affecting the female genital system and ranks as fourth most common neoplasia in women worldwide.¹ Neoadjuvant chemotherapy and surgery or induction chemotherapy followed by chemoradiotherapy had been the standard of care in the management cervical cancer.^2,3 Despite notable advancements in radiotherapeutic and chemotherapeutic approaches for cervical cancer, the tumor target and pelvic lymph regional leaking in the high radiation field also irradiate the bowel within the pelvic cavity and damage the bowel mucosa. Patients may compromise food intake, digestion, and nutrient absorption, hence aggravating the nutritional status of the patient.⁴ These negative effects, such as weight loss, nutrient malabsorption, and even body composition changes including skeletal muscle^5,6 ultimately lead to adverse clinical outcomes: treatment interruptions, infections, and longer hospital stays.⁷ Sarcopenia is often defined as a reduction in skeletal muscle mass (SMM). Its occurrence and progression seriously affect the quality of life and mortality of cancer patients.⁸ The skeletal muscle index (SMI) measured at the third lumbar vertebral (L3) level by computed tomography (CT) is a clinical accepted method for diagnosing sarcopenia.⁶ SMI is the skeletal muscle area (SMA) normalized to the square of height and expressed in cm²/m². SMI calculation requires the oncologists to manually segment the SMA. Accurate manual segmentation is not only time-consuming, labor-intensive but also prone to intra- and inter-observer variations among oncologists with different experience levels. In recent years, the advances in deep learning and computing resources provide novel opportunities to revisit these types of manual, time-consuming and routine tasks. In particular, deep learning has been shown to be particularly well suited to semantic segmentation tasks.⁹ Thus, it might be both reasonable and necessary to develop a quick and effective deep learning-aided tool for automatically segmentation of skeletal muscle.

Although deep learning technology has made breakthroughs in the field of medical image segmentation (such as U-Net and its variants),^10,11 It is still challenging to segment the skeletal muscle of cervical cancer automatically using DL because of the complexity of anatomic structure. Amarasinghe et al¹² constructed a 2.5 D UNet network to segment the L3 skeletal muscle of CT images for non-small cell lung cancer and acquired a mean Dice score of 0.92. Naser et al¹³ used 2D and 3D ResUNet to intelligently segment the skeletal muscle at the third cervical vertebra level, and their Dice score reached 0.95 and 0.96, respectively. The automatic segmentation of skeletal muscle in radiotherapy CT for cervical cancer patients still faces unique challenges. The pelvic anatomical structure is affected by tumor invasion and radiotherapy edema, resulting in blurred muscle boundaries. In addition, fat infiltration caused by chemoradiotherapy may change the distribution of CT values of muscle tissue. These intelligently segmented convolutional neural networks (CNN) can only extract local features. The parameter of network calculation is too large and the calculation cost is high, which is not conducive to the clinical deployment and application of medical equipment. Recently, some researchers have introduced the state space model represented by Mamba into the classical UNet, which overcomes the limitation of the long-distance modeling ability of convolution, and achieves the dual advantages of high segmentation accuracy and low computational cost on the skin lesion datasets.¹⁴ We thus proposed a lightweight network to segment skeletal muscle for cervical cancer radiotherapy using planning CT.

Materials and Methods

Study Design

Data of 160 cervical cancer radiotherapy patients from 2021-2024 at the our hospital were collected. The patients were manual annotated with skeletal muscle on L3 level. The planning CT images were obtained using a Brilliance CT Big Bore (Philips Healthcare, Best, Netherlands) according to a standard clinical acquisition protocol: 120 kVp, 120–300 mAs, 0.8×0.8 pixel spacing, 512×512 image size and 3 mm slice thickness. A series of curation and preprocessing procedures were performed to ensure that all data met the quality and standard requirement of the segmentation and predictive models.

Image Segmentation and Preprocessing

A total of 26246 CT scan slices were manually segmented for the training of the automated skeletal muscle segmentation network. Manual segmentation was finished by a junior radiologist in Eclipse 13.6 treatment planning system (Varian Medical System Inc, USA). The muscle and adipose tissue were separated by ‘Region-Growing’ tool through the Hounsfield unit (HU) thresholds −29 to +150 HU¹⁵ for body composition. The finial manual segmentation results were reviewed and approved by a senior radiologist (above 20 years working experience). Skeletal muscle, including rectus abdominis, abdominal wall muscles, quadratus lumborum, psoas and erector spinae was segmented. Figure 1 shows the detail skeletal muscle segmentation in CT images. The augmentation techniques included random transformations, scale intensity range, random cropping (320×320), random flip (P=0.5), random rotation (±10°), and random intensity shift ([0,1]).

Figure 1.

Detailed segmentation diagram of the skeletal muscle

Development of Lightweight Segmentation Network

An illustration of our research workflow is shown in Figure 2. The main processes include six components: skeletal muscle manual segmentation in TPS, a preprocessing operations before models training, comparison of model prediction, and three-dimensional reconstruction. The evaluation includes quantitative geometric metrics and clinical assessment. The existing lightweight semantic segmentation networks usually utilize depth-wise, group-wise and factorized-wise to compress network size. Recently, the State Space Models (SSMs) represented by Mamba were introduced into the classical UNet convolutional neural network, which overcomes the limitations of the long-distance modeling ability of convolution, and achieves the dual advantages of high segmentation accuracy and lightweight network on the skin lesion dataset.¹⁴ Ruan et al¹⁶ firstly developed a medical image segmentation model based on Mamba architecture for the segmentation of skin diseases. Wu et al¹⁴ analyzed the key factors in the influence of parameters in the Mamba architecture, and proposed a parallel visual Mamba layer called UltraLight VM-UNet to achieve the lowest computational load and excellent performance while keeping the total number of channels constant.

Figure 2.

The workflow of this work

In our work, A new architecture called SMA-Net is proposed and presented in Figure 3, which is a typical U-Net variant. Based on UltraLight VM-UNet, we proposed Multi-Hybrid Convolutional (MHC) modules and introduced Multi-Scale Attention Aggregation (MSAA). MHC module is composed of a parallelized Parallel Vision Mamba (PVM) layer¹⁴ and Omni-Dimensional Dynamic Convolution (ODConv).¹⁷ The performances of U-Net-based segmentation models may be weakened if we concatenate these feature maps directly using skip connections. Self-Attention Block (SAB) and Cross-Attention Block (CAB) was used to eliminate semantic differences between different depth feature layers.

Figure 3.

Detailed segmentation network of the skeletal muscle

MHC Module

MHC module is composed of PVM and dynamic convolution (ODConv) in parallel. ODConv is an advanced dynamic convolution design. The aim is to improve the feature learning ability of convolutional neural networks by introducing a multi-dimensional attention mechanism. Unlike traditional methods that only focus on one dimension of the number of convolution kernels, ODConv learns complementary attention on four dimensions: space size, number of input channels, and number of output channels, thereby optimizing convolution operations. Both PVM and ODConv have lightweight features and both local&global feature extraction capabilities. A feature X with channel number C first passes through a LayeNorm layer and then is divided into $Y_{i}^{C / 4}$ (i=1, 2, 3, 4). Finally, the four features are combined into the feature X_out with channel number C by concat operation. The specific operation can be expressed by the following equations:

Y_{1}^{C / 4}, Y_{2}^{C / 4}, Y_{3}^{C / 4}, Y_{4}^{C / 4} = Sp [L N (X_{in}^{C})]

(1)

VM_Y_{i}^{C / 4} = Mamba (Y_{i}^{C / 4}) + θ \cdot Y_{i}^{C / 4} i = 1, 2, 3, 4

(2)

X_{out} = Cat (VM_Y_{1}^{C / 4}, VM_Y_{2}^{C / 4}, VM_Y_{3}^{C / 4}, VM_Y_{4}^{C / 4})

(3)

Out = Pro [LN (X_{out})]

(4)

The core advantage of the MHC module is that it has the ability to capture long-range dependencies, strong local feature extraction capabilities, dynamic adaptability, and efficient computing characteristics. It breaks the limitations of traditional convolution and Mamba model, and achieves functional complementarity and performance leap.

MSAA Module

The deepest encoder and decoder are connected by MSAA module. In MSAA, spatial and channel dual paths are used for feature aggregation. The channel refinement starts with channel projection through 1×1 convolution, reducing the number of channels from C1 to C2. Multi-scale fusion includes convolution summation of different kernel sizes, such as 3×3, 5×5, 7×7. Subsequently, mean pooling and maximum pooling are used to aggregate spatial features, followed by 7x7 convolution and element-by-element multiplication with the sigmoid-activated feature map. The MSAA module aims to better capture multi-scale context information by aggregating features of different scales, which can enhance segmentation details, such as muscle edges, so as to improve the clarity of segmentation results.

SAB and CAB Module

The skip-connection path uses the SAB module and the CAB module.¹⁴ The combined use of SAB and CAB allows for the fusion of multi-stage features of different scales of our network. An input sequence X∈Rn×d, where n is the length of the sequence and d is the characteristic dimension. Through three different linear transformations, Q, K, V three matrices are generated from the input X. Softmax is used to obtain the attention weight, and the attention weight matrix A is used to weight and sum the Value matrix V to obtain the output sequence Z. The specific operation can be expressed by the following equations. The detail introduction is from ref. 14.

Q = X W^{Q}

(5)

K = X W^{K}

(6)

V = X W^{V}

(7)

A = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}})

(8)

A t t e n t i o n (Q, K, V) = A V

(9)

S A B (X) = L a y e r N o r m (X + M u l t i H e a d (X, X, X))

(10)

M u l t i H e a d (Q, K, V) = C o n c a t ({h e a d}_{1}, \dots, {h e a d}_{h}) W^{O}

(11)

{h e a d}_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})

(12)

C r o s s A t t e n t i o n (X, Y) = A V

(13)

C A B (X, Y) = L a y e r N o r m (X + M u l t i H e a d (X, Y, Y))

(14)

The core contribution of PVM lies in its ingenious parallelization design, which effectively constrains the growth of Mamba model parameters, thereby achieving the ultimate lightweight of the model without significantly sacrificing performance. ODConv utilizes a novel multi-dimensional attention mechanism and parallel strategy to learn complementary attention of convolutional kernels in all four dimensions of the convolutional kernel space of any convolutional layer. Compared to traditional convolution, it can reduce redundant parameters and lower computational complexity. The MSAA module is designed to better capture multi-scale contextual information by aggregating features across different scales. The MSAA including Multi-Scale fusion module, spatial aggregation module and channel aggregation module. Multi-scale fusion involves summing convolutions with different kernel sizes, such as 3×3, 5×5, and 7×7. Subsequently, mean pooling and max pooling are used to aggregate spatial features, followed by a 7×7 convolution and element-wise multiplication with the feature map activated by sigmoid. Meanwhile, channel aggregation uses global average pooling to reduce the dimension to 1, and then generates a channel attention map through convolution and ReLU activation. This map is expanded to match the input dimension and combined with the spatial refinement map. Therefore, MSAA enhances spatial and channel features in subsequent network layers. By incorporating the MSAA module, the resulting feature map is enriched with refined spatial and channel information. The network was based on the open-source library Pytorch 1.13.0 framework, CUDA 117,Python 3.8.19 and the reference implementation of U-Net. The structure of SMA-Net consists of an encoder on the left side (used to extract image features) and a decoder on the right side (to reconstruct the segmentation map based on the extracted features). The size of the input CT image is 3 × 320 × 320. First, through a convolutional layer, the shallow features are extracted, and the image size becomes 32 × 320 × 320. Then, through a GroupNorm layer, which normalizes the input data so that the mean value of the data in each feature channel group is 0 and the variance is 1. This processing helps to reduce the scale difference between different feature channels, making it easier for the network to learn the correlation between features. Next is the down-sampling layer, we use MaxPool as the down-sampling method. The down-sampling does not change the number of channels of the feature image, but only changes the width and height of the image. At this time, the size of the image becomes 32 × 160 × 160. Then, an activation function GeLu was used to deal with the feature map. This smoothness of GeLu helps to reduce the gradient disappearance problem of the neural network during the training process, so that the network can better learn the characteristics of the input data, and can also accelerate the training convergence speed of the neural network. Repeat the above operation twice, the size of the image becomes 96 × 40 × 40.

The above output feature layer goes through three MHC. The Mamba architecture is known for its ability to process long sequences and global context information and its improved computational efficiency as a state space model. It can effectively model long-range dependencies in medical images, which is a key aspect for accurate segmentation. The visual state space (VSS) model represented by Mamba has become a promising method. The VSS structure is shown, in order to further improve the computational efficiency and eliminate the entire multiplication branch, because the effect of the gating mechanism is achieved by the selectivity of 2D selective scanning. Therefore, the VSS module is composed of a single network branch with two residual modules, which imitates the architecture of the ordinary Transformer module. The MHC architecture extracts deeper feature information, the number of channels is deepened, and the image size is 128 × 40 × 40. In this way, MHC is repeated twice, and the deepest feature layer is 256 × 10 × 10. In the skip connection path in the middle of the network, the network model uses the Self-Attention Block (SAB) and Cross-Attention Block (CAB)¹⁴ to realize the fusion of multi-scale information. The decoding layer on the right side recovers the size of the network feature layer in turn. The feature of the network model is that the number of channels is set to 32, 64, 96, 128, 192, 256, respectively. This design helps to capture the details and context information of the image at different levels.

Model Training

The 160 annotated CT images were randomly divided into training set (N=116), validation set (N=16) and testing set (N=32) in the ratio of 7:1:2. The optimizer is Adam. The total loss function used in this study is a combination of the Dice loss function (L_dice) and Cross-Entropy loss function (L_ce), which was shown below. The detail formula of L_dice and L_ce is listed in ref. 18.

L o s s = 0.75 * L_{d i c e} (S_{s m a}, G_{s m a}) + 0.25 * L_{c e} (S_{s m a}, G_{s m a})

(15)

L_{dice} (s, g) = - \frac{2 \times \sum_{j = 1}^{n} s_{j} g_{j}}{\sum_{j = 1}^{n} s_{j} + \sum_{j = 1}^{n} g_{j}}

(16)

L_{ce} (s, g) = - \frac{1}{n} \sum_{j = 1}^{n} g_{j} \log s_{j} + (1 - g_{j}) \log (1 - s_{j})

(17)

n is the number of voxels of the input CT image data, s_j denotes the prediction probability that the jth individual voxel belongs to the target pixel, and g_j denotes the voxel value of the corresponding true value.

A cosine annealing scheduler were used (learning rate: 10⁻⁴, weight decay: 10⁻⁴). The training was terminated when the model loss did not decline in 100 epochs. The batch size was set as 16. Data augmentation techniques such as rotation, translation, flipping and cropping were applied. The pre-trained weights obtained from the large-scale trained on the ImageNet datasets.¹⁹ The networks were run on a server with NVIDIA A100 graphics card and a 32 GB RAM.

Evaluation

The automatic segmentation performance was evaluated by the geometric metrics and clinical oncologist’s evaluation. The geometric metrics are dice similarity coefficient (DSC), sensitivity (SEN), positive predictive value (PPV), and 95% Hausdorff distance (HD95), which were seen below.

D S C = \frac{2 \times T P}{F P + 2 T P + F N} \times 100 %

(18)

P P V = \frac{T P}{T P + F P} \times 100 %

(19)

S E N = \frac{T P}{T P + F N} \times 100 %

(20)

HD (C, P) = \max (h (C, P), h (P, C))

(21)

Where the TP, TN, FP and FN represents the number of true positive samples, true negative samples, false positive samples and false negative samples, respectively. h(C, P) = max{min||c-p||} (c∈C, p∈P) and h(P, C) = max{min||p-c||} (p∈P, c∈C). HD represents the distance between the surface point sets of the calculated true sample and the predicted sample. In order to eliminate the error, 95% HD (95HD) is used to eliminate the error effect caused by outliers. 95HD represents the largest surface-to-surface distance among the closest 95% surface voxels.

In addition, the automatic segmented results were examined and revised slice-by-slice by senior radiologists. The mean time of manual segmentation process for skeletal muscle takes 20 minutes to finish. Radiologists scored each automatic segmented result according to the following 4-level criteria²⁰: Level-1, no revision (automatic segmentation results are acceptable). Level-2, minor revision (need for less than 10 minutes to revise it). Level-3, major revision (need for 10 to 20 minutes to revise it), and Level-4, rejection (need for more than 20 minutes to revise it).

Sarcopenia Diagnosis

The automatic segmentation results of SMA-Net from 32 testing patients’ mid-plane of SMA were used to predicted SMI. We developed an in-house software platform to calculate the SMI. In 2016, the CDC established an ICD-10 code for sarcopenia, making it a recognized medical condition.²¹ SMI was used for sarcopenia diagnostic standard, according to the cut-off established for women (≤38.9cm²/m²).²² The sarcopenia classification was evaluated by confusion matrix and receiver operating characteristic curve (ROC).

Results

Patient Characteristics

The detail information of the remaining 160 patients and 26246 annotated CT slices were shown in Table 1.

Table 1.

Baseline Characteristics of 160 Patients Datasets

Datasets	Training (N)	Validation (N)	Testing (N)
No. of patients	116	16	32
Age (years)
≤50	63	7	17
>50	53	9	15
FIGO Stage
I	20	2	3
II	45	6	12
III	43	7	14
IV	5	1	3
Histology
SCC	85	10	22
AD	25	5	8
Other	6	1	2
Sarcopenia
Yes	66	9	19
No	50	7	13

Model Segmentation Performance

We firstly quantified the geometric metrics results of different models, as shown on Table 2. Our model exhibited the best comprehensive performance, outperforming other networks on the majority of DSC, SEN and 95HD metrics in testing sets, which demonstrated the superiority of our proposed network. The statistical analysis was also presented on the Table 1. The visual comparison maps of different models were presented in Figure 4. Our proposed model shows best comprehensive segmentation performance when compare with other stat-of-the-art models.

Table 2.

Comparison of the Automatic Delineation Performance in Testing Sets

Type	UNet++	PraNet	CPFNet	DeepLab V3+	UltraNet	Ours
DSC(%)↑	88.16	81.81	85.74	81.95	87.56	89.16
SEN(%)↑	84.25	81.08	79.58	78.55	77.86	88.21
PPV(%)↑	90.27	82.55	92.93	85.66	91.72	90.13
95HD(mm)↓	13.96	14.45	7.65	6.14	6.74	5.30

Figure 4.

Visual comparison maps of GT and different models. First column is different CT slices. Second to seventh column are the segmentation results of GT and different models

The best results are in bold. ↑ means higher value of this metric is better, and ↓ means lower value of this metric is better.

Ablation Experiments

To investigate the effectiveness of our network, ablation experiments were conducted.We constructed a UltraNet baseline model and then the new modules, MHC and MSAA were re-assembled from the baseline step by step and compared the performance quantitatively. Table 3 presents the comparison results of our method with different combinations of components. DSC can gradually increase with the addition of modules. These comparisons show that our proposed networks achieved best comprehensive performance.

Table 3.

Ablation Experimental Results of Our Network

Different modules			Metrics
Base	MHC	MSAA	DSC(%)	SEN(%)	PPV(%)	95HD(mm)
√	-	-	87.56	77.86	91.72	6.74
√	√	-	88.07	83.95	92.62	4.89
√	-	√	88.56	86.98	90.19	7.52
√	√	√	89.16	88.21	90.13	5.30

Clinical Assessment

The automatic segmentation results by our proposed model were reviewed slice-by-slice by a senior radiologist. The radiologist manual revised the predicted results. Figure 5 shows the distribution of radiologist’s scores on the manual revision of skeletal muscle. In testing datasets, 3 patients (9.37%) were Level-1 (no revision). 27 patients (84.38%) were Level-2 (minor revision). 2 patients (6.25%) were Level-3 (major revision).

Figure 5.

Frequency counts and relative (%) distribution of each level for skeletal muscle testing in oncologist’s scores

Sarcopenia Intelligent Diagnosis

The SMA derived from predicted segmentations showed near-perfect correlations with the true SMA in testing datasets, as shown in Figure 6.The sarcopenia diagnosis gold standard is L3-SMI ≤38.9 cm²/m². The sarcopenia classification was evaluated by confusion matrix and receiver operating characteristic curve (ROC), as shown in Figure 7.The metrics of accuracy, precision, recall, F1-Score and area under curve (AUC) were calculated. Our proposed network for cervical cancer patients predicted sarcopenia with 87.5% accuracy, 92.31% precision, 80% recall, 85.72% F1-Score, and 0.871 AUC.

Figure 6.

The SMA of predicted results and true results

Figure 7.

The confusion matrix and receiver operating characteristic curve (ROC) in testing datasets

Model Complexity Comparison

The complexity of calculation and parameters of different network models are shown in Table 4. Floating point of operations (FLOPS) is a floating-point operation used to measure the complexity of the algorithm or network. Its corresponding time complexity reflects the length of the network execution time. The parameter corresponds to the space complexity, which reflects the memory size. The UltraLight-VMUNet network has the smallest number of parameters, which is 0.58 M, followed by MB-UNet, which is 1.24 M; the smallest amount of calculation is the our network, which is 1.50 GFLOPS. Therefore, our proposed network still has advantages in network computation and parameter quantity.

Table 4.

The Comparison of Different Model Complexities

	UNet++	PraNet	CPFNet	Deeplab V3+	UltraNet	Ours
GFLOPS	125.60	10.87	12.62	31.22	2.35	1.50
Params/M	17.26	30.50	30.65	5.8	0.58	1.24

Discussion

Skeletal muscle area (SMA) measurement can determine the nutritional risk of cancer patients and monitor the progression of malnutrition.^23-25 At present, the planning CT is a necessary part of the radiotherapy process. In addition, studies have found^26-28 that excessive loss of skeletal muscle may lead to poor survival outcomes in many types of cancer. SMI (SMA/square of height) is an important prognostic predictor.²⁹ Therefore, the measurement of SMA is of great significance in evaluating the nutritional status and prognosis of tumor patients.

Mamba’s state space model (SSM) achieves linear computational complexity in long sequence modeling through selective state mechanism,³⁰ which is more suitable for global context capture of high-resolution medical images. The advantage of the Mamba architecture is that it can efficiently process long sequences, which makes it five times faster than the traditional Transformer in reasoning speed. It uses state space to dynamically adjust the state of the model, allowing the model to selectively pass or forget information based on current data. It not only simplifies the commonly used state space modules, but also integrates linear attention-like blocks and multi-layer perceptions, so it can maintain high performance while reducing the complexity of the model. Ruan et al.¹⁶ used the medical image segmentation model based on the pure Mamba architecture for the segmentation of skin diseases for the first time. Wu et al.¹⁴ analyzed the key factors affecting the parameters in the Mamba architecture. Based on these findings, a parallel visual Mamba layer called PVM layer was proposed to achieve the lowest computational load and excellent performance while maintaining the total number of channels. In this work, we proposed a parallel structure including PVM layer and ODConv. The parameters were greatly reduced through two structures: dynamic sparsity, and parameter sharing, while using parallel complementarity to maintain model expression. Its combination of dynamic calculation and static efficient structure, which provides a new idea for lightweight network design. The SAB+CAB mechanism is used to suppress irrelevant information in the image and highlight important local features. The area of the segmentation result is calculated by the internal program, and the predicted area is basically consistent with the real area. The computational complexity of GFLOPS can evaluate the performance of computing devices, that is, the computational speed and efficiency of deep learning models when performing training or reasoning tasks. The number of parameters reflects the size of the occupied memory. Therefore, considering the lightweight and fast efficiency of clinical deployment, SMA-Net still has advantages over existing classical networks. About 93.75% automatic segmentation results need minor revision or no revision to meet clinical requirement, which shows that deep learning can assist doctors to greatly reduce the working time.

In recent years, deep learning methods have made important progress in intelligent segmentation of medical images. In the early stage, Burns et al.³¹ divided 112 cases of abdominal CT into training set and test set, and used the deep learning algorithm UNet to intelligently segment the first to fifth lumbar (L5) skeletal muscles. The segmentation results on the test set were 0.879, 0.917, 0.930, 0.913 and 0.821, respectively. Naser et al.¹³ used deep learning network to segment cervical skeletal muscle, and the DSC results reached more than 0.95. The DSC of SMA-Net network segmentation result in this study is 89.16 %, and the results of UNet++³² are also compared (DSC = 88.16 %), which mainly depends on the different data sets, resulting in poor generalization of the deep learning model. Furthermore, our ablation experiments showed that DSC could be improved from 87.56% to 89.16%. We comprehensively compare the proposed advanced deep learning segmentation framework with previous classical networks. It is found that the intelligent segmentation performance of the network is better than the classical network model published in the past, and the comprehensive best segmentation accuracy is obtained. The cross section of L3 skeletal muscle contains complex interfascicular fat infiltration texture, blood vessels and contrast changes between different muscle groups (such as rectus abdominis and external oblique). This requires that the model can not only capture long-distance context dependence to distinguish different muscle regions, but also adapt to local texture mutations. The MHC module uses visual Mamba and dynamic convolution in parallel. In the deepest part of the encoder, the feature map has low resolution but rich semantic information. However, there are differences in muscle size and proportion in images among different patients. A single-scale receptive field may not capture the global information of the entire muscle mass and its internal fine fat texture at the same time. We introduce MSAA, which extracts features under different receptive fields through parallel multi-scale convolution (or pooling), and automatically weighs the importance of each scale feature through the attention mechanism. For a large-size muscle, the model may give a higher weight to the large receptive field branch to integrate its overall shape; for the internal fine fat texture, the small receptive field branch will provide more important details. Compared with the radiologist’s manual segmentation, the working time is greatly saved. The doctor only needs very little time to modify on the basis of deep learning intelligent segmentation to achieve the purpose. In terms of clinical evaluation, the reason for the need for major revision is the misjudgment of the muscle group or the complete missed segmentation of the severe fat infiltration area. In some cases with low contrast, the model may incorrectly classify a part of the transverse abdominis as background or adjacent muscles. This error stems from an insufficient understanding of the context of the overall anatomical structure, resulting in the need for doctors to redraw almost the entire outline of the muscle. When the muscle is replaced by a large amount of adipose tissue, and its texture features are very similar to the background fat, the model may not be able to identify the muscle region at all, resulting in large missing segmentation. This challenges the robustness of the model to abnormal features under pathological conditions.

The results of this study show that the SMA-Net proposed in this study can accurately segment L3 skeletal muscle and quickly calculate its SMA, which basically meets the clinical application. It is helpful for clinicians to quickly diagnose sarcopenia in patients with cervical cancer. However, there are still some limitations in this study, such as limited sample size and lack of external validation. In the follow-up work, we can further combine multi-center to expand the sample size, and include more cervical cancer cases to construct a more comprehensive sarcopenia diagnosis network model.

Footnotes

ORCID iDs

Liming Lu

Zhe Wu

Ethical Considerations

The study protocol was approved by the Ethics Committee of Huiyang Sanhe Hospital (No. Ethics [M] 2025-017). The requirement for patient consent was waived due to the retrospective nature of the study. In this project, the rights and interests of the subjects are fully protected and meet the requirements of the medical ethics committee. The research plan is approved. Approval Document No.: Ethical (M) 2025-017. The requirement for patient consent was waived.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Scientific and Technological Research Program of Chongqing Municipal Education Commission (KJQN202512819) and this project is financially supported by the Nuclear Energy Development Research Project under the State Administration of Science, Technology and Industry for National Defense (SASTIND), Grant No. HNKF202224(28).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Sung

Ferlay

Siegel

, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71(3):209-249.

Eminowicz

Vaja

Gallardo

, et al. Induction chemotherapy followed by chemoradiation in locally advanced cervical cancer: Quality of life outcomes of the GCIG INTERLACE trial. Eur J Cancer. 2025;2(220):115375.

Marchetti

De Felice

Di Pinto

, et al. Survival Nomograms after Curative Neoadjuvant Chemotherapy and Radical Surgery for Stage IB2-IIIB Cervical Cancer. Cancer Res Treat. 2018;50(3):768-776.

Lee

Lin

, et al. Association of bowel radiation dose-volume with skeletal muscle loss during pelvic intensity-modulated radiotherapy in cervical cancer. Support Care Cancer. 2021;29(9):5497-5505.

Deng

, et al. A novel skeletal muscle quantitative method and deep learning-based sarcopenia diagnosis for cervical cancer patients treated with radiotherapy. Med Phys. 2025;52(5):1–11.

Lee

Chen

Jan

Chen

. Association of patient-reported outcomes and nutrition with body composition in women with gynecologic cancer undergoing post-operative pelvic radiotherapy: an observational study. Nutrients. 2021;13(8):2629.

Liguori

Russo

Aran

, et al. Sarcopenia: assessment of disease burden and strategies to improve outcomes. Clinical Interventions in Aging. 2018;13:913-927.

Lee

Chang

Lin

, et al. Skeletal Muscle Loss Is an Imaging Biomarker of Outcome after Definitive Chemoradiotherapy for Locally Advanced Cervical Cancer. Clinical Cancer Research. 2018;24(20):5028-5036.

Wang

, et al. Clinical target volume (CTV) automatic delineation using deep learning network for cervical cancer radiotherapy: A study with external validation. J Appl Clin Med Phys. 2025;26(1):e14553.

10.

Ronneberger

Fischer

Brox

. U-net: convolutional networks for biomedical image segmentation. Int. Conf. on Medical Image Computing and Computer-Assisted Intervention. 2015;9351:234-241.

11.

Tseng

Liu

Yang

, et al. Performance assessment of variant UNet-based deep-learning dose engines for MR-Linac-based prostate IMRT plans. Phys Med Biol. 2023;11-68(17):175004.

12.

Amarasinghe

Lopes

Beraldo

, et al. A Deep Learning Model to Automate Skeletal Muscle Area Measurement on Computed Tomography Images. Front Oncol. 2021;7-11:580806.

13.

Naser

Wahid

Grossberg

, et al. Deep learning auto-segmentation of cervical skeletal muscle for sarcopenia analysis in patients with head and neck cancer. Front Oncol. 2022;28;12:930432.

14.

Liu

Liang

, et al. Ultralight vm-unet: Parallel Vision Mamba significantly reduces parameters for skin lesion segmentation. Patterns. 2025;6(11):101298. doi:10.1016/j.patter.2025.101298

15.

Chamchod

Fuller

Mohamed

, et al. Quantitative body mass characterization before and after head and neck cancer radiotherapy: A challenge of height-weight formulae using computed tomography measurement. Oral Oncol. 2016;61:62-69.

16.

Ruan

Xiang

. VM-UNet: Vision Mamba UNet for Medical Image Segmentation. ArXiv abs/2402.02491. 2024. doi:10.48550/arXiv.2402.02491. n. pag.

17.

Xin

, et al. Omni-dimensional dynamic convolution feature coordinate attention network for pneumonia classification. Vis. Comput. Ind. Biomed. Art. 2024;7:17.

18.

Duprez

Trauernicht

Simonds

Williams

. Self-configuring nnU-Net for automatic delineation of the organs at risk and target in high-dose rate cervical brachytherapy, a low/middle-income country's experience. J Appl Clin Med Phys. 2023;24(8):e13988.

19.

Deng

Dong

Socher

, et al. IEEE conference on computer vision and pattern recognition. Ieee; 2009. Imagenet: A large-scale hierarchical image database, Miami, Florida, 2009:248-255.

20.

Hou

Gao

Liu

, et al. Clinical evaluation of deep learning-based automatic clinical target volume segmentation: a single-institution multi-site tumor experience. Radiol Med. 2023;128(10):1250-1261.

21.

Anker

Morley

von Haehling

. Welcome to the ICD-10 code for sarcopenia. J Cachexia Sarcopenia Muscle. 2016;7:512-514.

22.

Mourtzakis

Prado

CMM

Lieffers

Reiman

McCargar

Baracos

. A practical and precise approach to quantification of body composition in cancer patients using computed tomography images acquired during routine care. Appl Physiol Nutr Metab. 2008;33:997-1006.

23.

Aredes

Garcez

Chaves

. Influence of chemoradiotherapy on nutritional status, functional capacity, quality of life and toxicity of treatment for patients with cervical cancer. Nutrition & dietetics. 2018;75(3):263-270.

24.

Op den Kamp

CMH

De Ruysscher

DKM

van den Heuvel

, et al. Early body weight loss during concurrent chemo-radiotherapy for non-small cell lung cancer. Journal of Cachexia, Sarcopenia and Muscle. 2014;5(2):127-137.

25.

Cho

Jeon

, et al. Relationship Between Sarcopenia and Prognosis in Patient With Concurrent Chemo-Radiation Therapy for Esophageal Cancer. Frontiers in Oncology. 2019;9:366.

26.

Deantoni

Mirabile

Chiara

, et al. Impact of low skeletal muscle mass in oropharyngeal cancer patients treated with radical chemo-radiotherapy: A mono-institutional experience. Tumori. 2024;110(2):116-123.

27.

Tan

Birdsell

Martin

Baracos

Fearon

KCH

. Sarcopenia in an overweight or obese patient is an adverse prognostic factor in pancreatic cancer. Clin Cancer Res. 2009;15(22):6973-6979.

28.

Reisinger

Bosmans

Uittenbogaart

, et al. Loss of Skeletal Muscle Mass During Neoadjuvant Chemoradiotherapy Predicts Postoperative Mortality in Esophageal Cancer Surgery. Ann Surg Oncol. 2015;22(13):4445-4452.

29.

Park

Yang

Chung

, et al. Skeletal muscle gauge as a prognostic factor in patients with colorectal cancer. Cancer Med. 2021;10(23):8451-8461.

30.

Dao

. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752. 2023.

31.

Burns

Yao

Chalhoub

Chen

Summers

. A Machine Learning Algorithm to Estimate Sarcopenia on Abdominal CT. Acad Radiol. 2020;27(3):311-320.

32.

Zhou

Siddiquee

MMR

Tajbakhsh

Liang

. UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. IEEE Transactions on Medical Imaging. 2020;39(6):1856-1867.

A Lightweight Skeletal Muscle Intelligent Segmentation Network Based on Planning CT for Cervical Cancer Radiotherapy

Abstract

Purpose

Methods and Materials

Results

Conclusion

Keywords

Introduction

Materials and Methods

Study Design

Image Segmentation and Preprocessing

Development of Lightweight Segmentation Network

MHC Module

MSAA Module

SAB and CAB Module

Model Training

Evaluation

Sarcopenia Diagnosis

Results

Patient Characteristics

Model Segmentation Performance

Ablation Experiments

Clinical Assessment

Sarcopenia Intelligent Diagnosis

Model Complexity Comparison

Discussion

Footnotes

ORCID iDs

Ethical Considerations

Funding

Declaration of conflicting interests

References