Sage Journals: Discover world-class research

Abstract

Scribble-supervised semantic segmentation has emerged as a promising alternative to fully supervised methods in medical imaging, owing to its low annotation cost and the inherent inclusion of contour information. However, the sparse supervision provided by scribble annotations, coupled with the underutilization of contour cues, often results in incomplete boundary representations in pseudo labels generated by the model. This limitation hinders the model’s ability to accurately capture complex anatomical structures, posing a significant challenge to achieving precise segmentation. To address this issue, we propose a novel three-branch network architecture. Built upon a multi-task learning framework, the model introduces a contour-assisted label supervision (contour auxiliary labels supervision) mechanism and a contour attention module to enhance the auxiliary decoder’s capability in extracting contour features. In addition, we design a contour mixed pseudo labels supervision strategy, which incorporates contour-enhanced representations into the pseudo-label generation process, thereby providing more informative and higher-quality supervision for scribble-based learning. We evaluate our method on the ACDC, MSCMR, and SegPC-2021 datasets. Experimental results demonstrate that our approach consistently outperforms state-of-the-art methods in terms of accuracy, robustness, and generalization. The scribble annotations and experimental code for the SegPC-2021 dataset are available at Github.

Keywords

scribble annotation medical image segmentation weakly supervised learning pseudo labels

1. Introduction

Semantic segmentation has long been a critical task in medical image analysis. In clinical practice, segmentation accuracy is essential for both diagnosis and prognostic assessment. Recently, convolutional neural networks (CNNs) and transformer-based models have achieved remarkable success in medical image segmentation (Ronneberger et al., 2015; Valanarasu et al., 2021). Most of these models rely on pixel-level annotations from large-scale datasets that are densely and precisely labeled. However, manual annotation of medical images is both time-consuming and costly.

To address this challenge, researchers have increasingly explored new techniques that do not require large-scale, fully annotated datasets. Among them, weakly supervised learning has gained considerable attention as a promising alternative (Li et al., 2023; Luo et al., 2022; Obukhov et al., 2019; Wang et al., 2023a). Weakly supervised semantic segmentation typically leverages loosely annotated labels—such as points within regions of interest, scribbles, and bounding boxes—to train models (Chen & Sun, 2023; Cheng et al., 2021; Dai et al., 2015; Lin et al., 2016; Ying et al., 2023). These approaches significantly reduce annotation costs and alleviate the burden on clinical experts.

Scribbles inherently contain rich contour information and, compared to other forms of weak annotations such as points and bounding boxes, offer clearer guidance for medical image segmentation. As illustrated by the “scribble annotation” example in the figure, scribble supervision is a form of weak supervision in which annotators only need to draw a few strokes over the target regions. Scribble supervision has been widely applied in various medical imaging tasks, including cardiac magnetic resonance imaging (MRI) segmentation (Bernard et al., 2018; Zhuang, 2016, 2019) and uterine cancer MRI segmentation (Ying et al., 2023).

Although scribble annotations can substantially reduce reliance on expensive and time-consuming expert-labeled segmentation masks while providing rich contour information, existing studies have often underutilized this critical cue for segmentation. As a result, the models are hindered in their ability to learn the detailed visual features necessary for precise delineation.

To address this challenge, we propose a novel three-branch segmentation model. The model is designed to first independently learn contour representations through a contour-assisted label supervision (contour auxiliary labels supervision (CALS)) mechanism and a contour attention (CA) module, and then integrate these features into the pseudo-label generation process via a contour mixed pseudo labels supervision (CMPLS) strategy to produce more complete and structurally accurate pseudo labels.

Specifically, inspired by the principles of multi-task learning, we adopt a three-branch architecture consisting of a shared encoder and three decoders. Two of the decoders serve as primary branches, which learn from scribble annotations using the partial cross-entropy (pCE) loss. The third decoder, in the CALS branch, learns contour representations from auxiliary contour labels. These contour labels are generated during preprocessing using the Canny edge detection algorithm and are trained via an edge-based loss function. We observed that the output of the CALS decoder alone lacks sufficiently prominent contour representations, making it suboptimal when used in isolation. To remedy this, we introduce the CA module, which takes the CALS decoder’s output as input and applies a convolutional block attention mechanism to generate a refined contour feature map that emphasizes key boundary regions. This contour feature map is then fused with the outputs of the two primary decoders through element-wise multiplication and a dynamic mixing operation, resulting in pseudo labels that preserve more complete shape information.

Ultimately, the proposed method integrates both contour-assisted labels and enhanced pseudo labels for end-to-end training of the segmentation network. By effectively leveraging the contour information inherent in scribble supervision, our model achieves more accurate segmentation masks. The CALS mechanism contributes additional structural cues, the CA module guides the model’s attention toward critical contour regions, and the CMPLS strategy embeds these contour-aware features into the pseudo-label generation process. This joint strategy is designed to substantially improve the accuracy and reliability of segmentation in medical image analysis.

The main contributions of this paper are as follows:

We propose an innovative three-branch model combined with a CMPLS strategy to enhance the performance of scribble-based medical image segmentation. By effectively integrating the refined contour features learned from the auxiliary branch into the dynamic fusion of the primary branches, this model significantly improves the shape integrity of the generated pseudo labels.

We introduce a CALS mechanism and a CA module to substantially enhance the segmentation performance of the model. The CALS mechanism employs a multi-task learning strategy to extract additional contour features from contour auxiliary labels. Concurrently, the CA module utilizes convolutional block attention to effectively sharpen the model’s focus and precision in capturing critical contour regions.

To evaluate the performance of our model, we conducted extensive experiments on the SegPC-2021 dataset and two benchmark datasets: ACDC and MSCMR. The results indicate that our method not only outperforms existing scribble supervision techniques in terms of performance but also significantly narrows the gap with mask supervision methods, even surpassing their performance in certain cases.

2. Related Work

2.1. Pseudo Labels for Semantic Segmentation

Pseudo-labeling was originally developed in the context of semi-supervised learning. It typically involves using a base network pretrained on labeled data to generate predictions for unlabeled data, which are then treated as pseudo labels. These pseudo labels enable the incorporation of unlabeled samples into model training and have been widely adopted in tasks such as image classification and segmentation (Chen et al., 2021, 2020; Filipiak et al., 2021; Sohn et al., 2020; Teh et al., 2021; Wang et al., 2021b, 2023c; Yang et al., 2021; Zhu et al., 2024).

Inspired by pseudo-label techniques in semi-supervised learning, researchers have extended this concept to weakly supervised segmentation. Weakly supervised semantic segmentation leverages various forms of weak annotations (e.g., image-level tags, points, scribbles, and bounding boxes) to generate mask pseudo labels for images, simulating fully supervised segmentation during network training (Can et al., 2018; Lin et al., 2024; McEver & Manjunath, 2020; Wang et al., 2018). Current research in this area can be broadly categorized into four groups based on the type of supervision signals utilized: image-level tags supervision (Lin et al., 2024; Wang et al., 2018), point supervision (McEver & Manjunath, 2020; Ren et al., 2024), scribble supervision (Can et al., 2018; Lee & Jeong, 2020), and bounding box supervision (Dai et al., 2015). Image-level tags supervision relies solely on target class information within the image. Wang et al. (2018) utilized class activation maps to identify initial object regions and extracted common features to expand them into pseudo labels, followed by iterative optimization. Point supervision annotates the target region with a sparse set of points to guide network training. McEver and Manjunath (2020) introduced a three-stage process based on point-level annotations: first, point annotations are used to train the point supervised class activation maps (PCAMs). Then, PCAMs are employed to train IRNet, which can predict pixel relationships and generate coarse class labels. Finally, the random walk algorithm (Grady, 2006) is used to generate pseudo labels for training a fully supervised network. Ren et al. (2024) further explored class label classification. By leveraging superpixels, they combined spatial classification with spectral classification from hyperspectral imaging to generate high-quality pseudo labels. In the domain of scribble pseudo labels, Can et al. (2018) initially employed random walks (Grady, 2006) to generate preliminary pseudo labels for training the segmentation network. They further refined these labels using conditional random fields (CRFs; Lafferty et al., 2001) to re-annotate the network outputs and adjusted the labels during training through uncertainty estimation. In contrast to their pseudo-labels generation approach, Lee and Jeong (2020) proposed a two-stage training method. During the “warm-up” phase, the network is trained solely with scribble annotations. After the warm-up phase, pseudo-labels are generated by averaging the results of multiple iterations and are iteratively refined in subsequent training iterations. Bounding box supervision only uses bounding boxes that enclose the target to guide the model’s learning. Dai et al. (2015) applied unsupervised region proposal methods (van de Sande et al., 2011a, 2011b) to generate candidate pseudo labels within the bounding boxes. These pseudo labels are iteratively refined based on the network’s output to achieve precise segmentation guidance.

It is worth noting that many weakly supervised methods rely on unsupervised segmentation techniques—such as class activation maps and superpixels (Dai et al., 2015; Lin et al., 2024; Ren et al., 2024; Wang et al., 2018)—or traditional algorithms such as random walk (Can et al., 2018; McEver & Manjunath, 2020) to generate pseudo labels. However, these approaches often suffer from poor generalizability in practical applications and struggle to adapt to the complexities of medical image segmentation. In contrast, our method generates pseudo labels with more complete shape representations for scribble supervision by dynamically fusing information from multiple decoders and incorporating the contour features learned by an auxiliary decoder.

2.2. Scribble-Supervised Semantic Segmentation

Building upon the main technical distinctions among various approaches and insights from previous studies (Li et al., 2023), scribble-supervised semantic segmentation methods can be broadly categorized into three main groups: (1) mix augmentation methods, (2) different scribble supervision loss functions, and (3) pseudo label generation methods.

(1) Mix augmentation methods. Mix augmentation is a crucial data augmentation strategy that combines multiple images and their corresponding labels in a specific manner to generate additional training samples that encompass various category information. While the appearance of the mixed images may not always be entirely realistic, their contribution to the training process is substantial (Chaitanya et al., 2019). Currently, there are few scribble-supervised semantic segmentation methods based on mixup augmentation, and all are trained based on the principle of consistency (Lei et al., 2024; Zhang & Zhuang, 2022; Zheng et al., 2022). Among them, Zhang and Zhuang (2022) were the first to apply mixed augmentation to this field. They increased the proportion of scribble annotations and used random occlusion as part of the mixed augmentation approach. Their method is based on cyclic consistency, where the network is first trained by predicting and then augmenting the image, or augmenting first and then predicting, using the difference between the two to guide the network to learn and integrate diverse features.

(2) Different scribble supervision loss functions. Scribble annotations, which only cover a small fraction of pixels in an image, present a significant challenge for effectively leveraging these limited annotations to guide network training and produce reliable model outputs. To address this, numerous researchers have proposed a diverse array of loss function strategies from various perspectives, such as extending supervision signals (Kim & Ye, 2019; Lin et al., 2016; Wang et al., 2023b), and regularization-based approaches (Liu et al., 2021; Obukhov et al., 2019; Zhang & Zhuang, 2023). Lin et al. (2016) proposed the pCE loss, which computes the cross-entropy loss only for pixels with scribble annotations. This approach effectively propagates supervision signals from the annotated pixels to the unsupervised ones. Kim and Ye (2019) observed a striking similarity between the softmax layer and the eigenfunctions in the Mumford–Shah function, leading them to propose a new regularized loss function based on the Mumford–Shah function for training networks. Wang et al. (2023b) introduced a balancing label preference loss function for scribble-supervision semantic segmentation, which balances the weights of pixels with different annotation probabilities and propagates supervision to unlabeled regions through a local aggregation module. Obukhov et al. (2019) developed the gated CRF (GCRF) loss, which refines boundaries and enhances label consistency through a CRF-based gating mechanism. Liu et al. (2021) proposed using uncertainty measures derived from teacher network predictions to guide the student network’s learning. Recently, Zhang and Zhuang (2023) proposed a Zen-inspired regularization loss function based on the principles of balance and simplification, which streamlines the learning process by balancing complexity and accuracy.

(3) Pseudo label generation methods. In scribble-based supervision, pseudo-label generation methods aim to create high-quality mask pseudo labels using limited scribble annotations, thereby simulating fully supervised segmentation scenarios. Most current approaches are based on a dual-branch architecture(Li et al., 2023, 2024; Liu et al., 2023; Wang et al., 2023a; Zhou et al., 2023), initially proposed by Luo et al. (2022). They proposed a dual-branch network that generates a random weight between zero and one to mix the outputs of the two branches, creating more learnable pseudo labels. This approach effectively addresses the issue of each branch retaining its own predictions, thereby improving the quality of the pseudo labels and enhancing the overall performance of the model. Wang et al. (2023a) proposed a spatial–spectral dual-branch method for polyp segmentation, leveraging the complementarity of different feature representations and mutual learning between the two branches to generate high-quality pseudo labels. However, this method is highly sensitive to spatial and color features, making it less suitable for grayscale images, such as MRI and computed tomography scans. Zhou et al. (2023) built upon the dynamic mixed pseudo labels supervision in Luo et al. (2022) by employing superpixel-guided random walks to supplement pseudo labels. Li et al. (2024) were the first to introduce transformers into scribble-supervised medical semantic segmentation. They employed a dual-branch architecture combining transformers and CNNs to effectively integrate global and local information, generating pseudo labels using a dynamic mixing approach.

Despite the achievements of previous methods in medical image segmentation, they are limited when generating pseudo labels due to the lack of complete shape features. To overcome this challenge, we integrated contour auxiliary labels extracted in the preprocessing stage into the network through multi-task learning, thereby enhancing the model’s ability to learn contour features. In addition, we introduced a CA module that further strengthens the contour features extracted from the images. By adopting a CMPLS strategy, we incorporated these contour features into the pseudo-labels generation process to produce high-quality pseudo-labels with richer shape information. This strategy not only improves the model’s ability to recognize the shape of the target structure but also significantly enhances the overall performance of the segmentation task.

3. Methodology

3.1. Overall Architecture

As illustrated in Figure 1, the proposed three-branch network architecture consists of a shared encoder $θ_{e}$ and three slightly different decoders: $θ_{d 1}$ , $θ_{d 2}$ , and $θ_{d 3}$ . The shared encoder learns common image features, while the three decoders individually capture diverse representations based on the shared features. These heterogeneous features are dynamically fused with contour information to generate pseudo labels that exhibit more complete shape characteristics. Decoders $θ_{d 1}$ and $θ_{d 2}$ are responsible for learning segmentation features from the shared representations and are jointly used to produce pseudo labels. Decoder $θ_{d 3}$ serves as an auxiliary branch that learns contour features from contour-assisted labels and integrates these features into the pseudo-label generation process.

Figure 1.

The Overall Structure is Shown in the Figure. The Framework Consists of an Encoder and Three Decoders, Decoder $θ_{d 1}$ and Decoder $θ_{d 2}$ Cooperate With Each Other to Dynamically Mix Their Outputs and Incorporate Contour Attention Information to Generate a Pseudo-Label for Further Network Training. Meanwhile, the Decoder $θ_{d 3}$ is Used to Learn the Contour Representation and Generate Contour Attention Information.

If a single branch is used for segmentation while another serves as an auxiliary branch to learn contour information, the network may suffer from an inherent issue described in Huo et al. (2021), where each branch tends to preserve its own predictions, thereby hindering effective model updates. To address this, we adopt a dual-decoder structure inspired by Luo et al. (2022), in which two decoders with different perturbations independently learn segmentation predictions and are dynamically fused. Unlike (Luo et al., 2022), however, we introduce an additional CALS (contour-assisted label supervision) branch that learns supplementary contour features from perturbed inputs and contributes them to the pseudo-label supervision process. This design effectively reintroduces structurally meaningful perturbations, further decouples the learning pathways of different branches, and alleviates their tendency to reinforce initial predictions. During training, the integration of the CALS branch and the CA module progressively enhances the model’s ability to extract contour features, guiding its predictions to focus on critical structural regions.

Specifically, the decoders $θ_{d 1}$ , $θ_{d 2}$ , and $θ_{d 3}$ collaboratively generate individual outputs $y_{1}$ , $y_{2}$ , and $y_{3}$ . The CA module then transforms the output of the auxiliary decoder $θ_{d 3}$ into a contour-aware feature map $y_{3 C A}$ . This map is element-wise multiplied with $y_{1}$ and $y_{2}$ to obtain contour-enhanced outputs $y_{1CA}$ and $y_{2CA}$ . These enhanced predictions are dynamically fused to generate the final pseudo labels, which, along with contour-assisted labels and scribble annotations, are used to supervise the end-to-end training of the three-branch network.

3.2. Contour Mixed Pseudo Labels Supervision (CMPLS)

In order to generate pseudo labels with more complete shape features, we adopt the CMPLS strategy. By incorporating additional shape information obtained from the CALS branch into the pseudo-label generation process of Luo et al. (2022), this strategy effectively enhances the model’s feature extraction and representation learning capabilities. As illustrated in Figure 1, $θ_{d 2}$ is designated as the main decoder, and its output serves as the final segmentation result. The pseudo-labels generation process is defined as follows:

\begin{aligned} PL = argmax {[α \times y_{1CA} + (1 - α) \times y_{2CA}]}, \end{aligned}

(1)

where PL represents the generated pseudo labels, and

α

is a randomly generated value between

[0, 1]

in each iteration. This strategy enhances the diversity of pseudo labels by severing the gradient flow between

θ_{d 1}

and

θ_{d 2}

, maintaining their independence. Meanwhile, the contour-enhanced outputs

y_{1CA}

and

y_{2CA}

, obtained from

y_{3CA}

, incorporate additional features learned by

θ_{d 3}

, further reinforcing the independence among the decoders, resulting in pseudo labels with more complete shape characteristics.

During training, decoders $θ_{d 1}$ and $θ_{d 2}$ collaboratively learn features and generate their respective segmentation predictions through the dynamic mixing strategy. Concurrently, decoder $θ_{d 3}$ focuses on learning contour features and produces contour feature maps. CMPLS dynamically integrates these contour maps with the segmentation predictions, creating the pseudo labels used to supervise the network. By incorporating the contour feature map $y_{3CA}$ into the pseudo-labels generation process, we compensate for the limited contour information provided by scribble annotations with more detailed shape features. As the training iterates, the decoders increasingly focus on contour regions, leading to a marked improvement in the model’s overall accuracy and effectiveness.

The loss function used for training the CMPLS is defined as follows:

\begin{aligned} L_{CMPLS} = L_{Dice} (PL, y_{1CA}) + L_{Dice} (PL, y_{2CA}) . \end{aligned}

(2)

In equation (2), $L_{Dice}$ represents the widely used dice loss, which is more advantageous than other loss functions when dealing with sample-imbalanced datasets. Current deep learning methods perform semantic segmentation tasks by classifying each pixel. Often, in semantic segmentation datasets, there is a significant imbalance between foreground and background pixels. Therefore, we can regard such datasets as unbalanced. To better address this imbalance, we adopt the Dice loss function.

Simultaneously, we utilize the loss function defined in equation (3) to learn target segmentation from the scribble annotations.

\begin{aligned} L_{{pCE}_{all}} = L_{pCE} (y_{1CA}, s) + L_{pCE} (y_{2CA}, s), \end{aligned}

(3)

where

s

denotes one-hot scribble annotations.

L_{pCE}

denotes the pCE function. The pCE loss function ignores unlabelled pixels and calculates the loss only for labeled pixels during the calculation. This will allow the network to learn useful information from the weak scribble labels.

3.3. Contour Auxiliary Labels Supervision (CALS)

To provide additional contour features for pseudo labels and enhance the model’s ability to mine contour characteristics, we propose CALS. This mechanism consists of two key components: the supervision mechanism and the CA module, designed to optimize the model’s effectiveness in learning contour features.

3.3.1. Supervision Mechanism

To enhance the network’s ability to perceive the structure of segmentation targets, we introduced an auxiliary branch designed to learn contour features (as shown in the CALS section of Figure 1). Inspired by multi-task learning, this branch guides the model to progressively align the predicted contour features with the pseudo contour labels by calculating the loss between the processed model predictions and the contour pseudo labels.

Specifically, we employ the Canny algorithm (Canny, 1986) to extract the edges of the segmentation target structure from the image. Since an object’s contour can be viewed as a connection of its edges (Arbeláez et al., 2011), we treat the extracted edges as contours and store them, thereby avoiding additional annotation work. Practically, we only need to load the pre-saved contour maps alongside other image information. During training, the output of decoder $θ_{d 3}$ is $y_{3} \in R^{OutChannel * H * W}$ , where OutChannel represents the number of output channels of $θ_{d 3}$ . We input $y_{3}$ into the CA module to obtain $y_{3CA} \in R^{1 * H * W}$ , which contains prominent contour features. We then calculate the auxiliary loss between $y_{3CA}$ and the contour map $y_{CL}$ and perform backpropagation to progressively align $y_{3CA}$ with $y_{CL}$ , thereby optimizing the model’s ability to extract contour features. To further enhance contour feature learning, we incorporated active boundary loss (Wang et al., 2021a) in the loss calculation for this module. During the validation or testing phase, the CALS module can be directly discarded without impacting the overall performance of the model.

Throughout the end-to-end training process, our method gradually aligns the predicted boundaries with the true boundaries, thereby improving the model’s ability to perceive contours.

We utilize $L_{EDGE}$ as the loss function to supervise decoder $θ_{d 3}$ , and its mathematical expression is as follows:

\begin{aligned} L_{EDGE} = L_{CE} (y_{CL}, y_{3}) + L_{ABL} (y_{CL}, y_{3}), \end{aligned}

(4)

where

L_{CE}

is the cross-entropy loss,

y_{CL}

is the contour auxiliary labels.

L_{ABL}

is the active boundary loss (ABL; Wang et al., 2021a).

Furthermore, the loss function for the entire network training is expressed in equation (5):

\begin{aligned} L_{total} = λ_{1} \cdot L_{{pCE}_{a} l l} + λ_{2} \cdot L_{CMPLS} + λ_{3} \cdot L_{EDGE} . \end{aligned}

(5)

In our experiments, the values of $λ_{1}$ , $λ_{2}$ , and $λ_{3}$ are set to 0.5, 0.25, and 0.5, respectively.

Applying different losses jointly to the network makes our network focus on both scribble labels, contour labels, and network-generated pseudo labels. In addition, the network also generates a pseudo label, which focuses more on contours, as a result, which is useful for the task of segmenting other elliptical or circular targets.

3.3.2. CA Module

After introducing the CALS strategy, we observed a decline in convergence speed when the auxiliary branch relied solely on $y_{3}$ and contour auxiliary labels to compute the loss. We believe this is due to the lack of prominent contour features in $y_{3}$ , which hinders effective feature extraction. To address this issue, we incorporated the CA module. The CA module leverages a convolutional attention mechanism that adaptively extracts important contour-related information from the feature map while suppressing irrelevant or redundant features. By combining max-pooling and average-pooling operations, the CA module handles multi-scale feature representations, capturing both local details and global contour structures. These multi-scale features are then concatenated and passed through a large $7 \times 7$ convolutional kernel to extract complete boundary information. Finally, the sigmoid activation function emphasizes the key contour features while reducing noise interference.

The integration of the CA module with CALS not only enhances the auxiliary branch’s ability to perceive contour features but also accelerates its convergence while maintaining consistency with the contour auxiliary labels during training. Furthermore, the output $y_{3CA}$ from the CA module contains rich contour features, which are incorporated into the pseudo-labels generation process to simulate a fully supervised segmentation scenario. The inclusion of contour features makes the generated pseudo labels more aligned with the ground truth, thus providing stronger guidance for improving the model’s segmentation performance.

The module takes the output $y_{3}$ from the decoder $θ_{d 3}$ as input. As illustrated in Figure 2, these input features first undergo maximum pooling and average pooling to obtain feature maps $Max \in R^{1 \times H \times W}$ and $Avg \in R^{1 \times H \times W}$ , respectively. These feature maps are then concatenated along the channel dimension, resulting in a combined feature map of size $R^{2 \times H \times W}$ . Subsequently, this combined feature map is processed through seven convolutional layers to produce a feature map of size $R^{1 \times H \times W}$ . After being mapped through a sigmoid function, the resulting attention weights $y_{3CA} \in R^{1 \times H \times W}$ are constrained within the $[0, 1]$ range. Through this process, contour features are prominently highlighted in the attention weights. The formula for CA is presented in equation (6):

\begin{aligned} X_{CA} = Sigmoid ({Conv}^{7} ([AvgPool (X), MaxPool (X)])), \end{aligned}

(6)

where

X

is the input feature.

Figure 2.

Schematic of the Contour Attention Module.

These weights are then element-wise multiplied with the outputs $y_{1}$ and $y_{2}$ from decoders $θ_{d 1}$ and $θ_{d 2}$ , respectively, enhancing the focus on contour regions within $y_{1}$ and $y_{2}$ . This results in the generated mixed pseudo label emphasizing contour features. With the guidance of the CA module, our model is able to better focus on critical contour areas, thereby improving the accuracy of recognition.

4. Experiments

To validate the effectiveness of our proposed method, we conducted a series of comprehensive experiments on the ACDC, MSCMR, and SegPC-2021 datasets. The ACDC and MSCMR datasets are the most commonly used benchmarks in the field of scribble-based semantic segmentation (Li et al., 2023, 2024; Liu et al., 2023; Luo et al., 2022; Zhang & Zhuang, 2022, 2023). The SegPC-2021 dataset is renowned for its complexity, focusing on multi-object segmentation in bone marrow cell analysis (Gupta et al., 2023; Qiu et al., 2022). Notably, precise segmentation of various tissues within the SegPC-2021 dataset is crucial for the diagnosis of multiple myeloma (MM; Gupta et al., 2023).

4.1. Datasets

The SegPC-2021 dataset is the most comprehensive publicly available dataset for plasma cell segmentation in MM (Gupta et al., 2023), comprising 775 bone marrow smear images captured by two cameras attached to a microscope. These bone marrow smears are well-known for their complex tissue structures (Jiang et al., 2022). The dataset includes precise mask annotations for both nuclei and cytoplasm, created by 10 experts, making it a highly complex multi-object segmentation dataset, particularly suitable for research in scribble-based segmentation. We first extracted individual cells from the dataset into separate images, ensuring that each image contains only one distinct and complete cell. Then, we used different colors in the scribble annotations to distinctly mark the nucleus, cytoplasm, and background of each cell. Following the guidelines in Valvano et al. (2021), we performed scribble annotations on all cells in the dataset, resulting in 2631 cell images with their scribble annotations, as shown in Figure 3(a). Each image contains two segmentation targets. Compared to cardiac segmentation datasets, these cell images have targets occupying a larger portion of the image, making them more sensitive to performance metrics, and each image contains two segmentation targets.

Figure 3.

(a) An Example of the SegPC-2021 Dataset, and (b) an Example of the Publicly Available ACDC Dataset. Since MSCMR is Similar to ACDC, it is not Shown Here. In (a) and (b), the “Image” is the Raw Image, the “Mask Annotation” is the Mask Annotations Superimposed on the Raw Image, the “Scribble Annotation” is the Scribble Annotations Superimposed on the Raw Image, and the “Contour Auxiliary Label” is the Contour Auxiliary Labels Generated During the Preprocessing Stage. In the “Mask Annotation” and the ”Scribble Annotation” of (a), red, blue, black, and yellow Represent Nuclei, Cytoplasm, Background and Unannotated Pixels, Respectively. In the “Mask Annotation” and the “Scribble Annotation” of (b), Red, Green, Blue, Black, and Yellow Represent RV, LV, Myo, Background and Unannotated Pixels, Respectively. Note. RV = Right Ventricle; LV = Left Ventricle; Myo = Myocardium.

The ACDC dataset (Bernard et al., 2018) comprises cardiac MRI data from 100 patients, with expert annotations for the left ventricle (LV), right ventricle (RV), and myocardium (Myo). Each patient’s data includes images from both end-diastole and end-systole phases of the cardiac cycle. Based on previous studies (Luo et al., 2022; Valvano et al., 2021), we used a two-dimensional (2D) slice segmentation dataset instead of a three-dimensional volumetric segmentation dataset due to thickness considerations. The 2D slice dataset contains a total of 1,902 images. The majority of these images have three segmentation targets, while a few contain only two. The size of the segmentation targets varies across different images. Valvano et al. (2021) provided scribble annotations for each image, with examples of the mask annotations and scribble annotations shown in Figure 3(b).

The MSCMR dataset (Zhuang, 2016, 2019) consists of late gadolinium-enhanced MRI scans from 45 cardiomyopathy patients, with expert annotations for the LV, RV, and Myo provided for each scan. Enhanced MRI poses greater challenges for segmentation tasks compared to nonenhanced MRI (Zhang & Zhuang, 2022). Based on previous studies (Li et al., 2023; Liu et al., 2023), we utilized a 2D slice segmentation dataset, which includes a total of 686 images. Similar to the ACDC dataset, most images contain three segmentation targets, while a few have two, and the size of the segmentation targets varies across different images. Additionally, Zhang and Zhuang (2022) provides scribble annotations for 407 images from 25 scanned slices.

4.2. Implementation Details

We adopted U-Net (Ronneberger et al., 2015) as the base segmentation network architecture and expanded it into a triple-branch network by adding two auxiliary decoders. A dropout layer ( $ratio = 0.5$ ) is inserted before each convolutional block of $θ_{d 2}$ and $θ_{d 3}$ to introduce perturbations. We implemented and ran our method, along with other comparative approaches, using PyTorch 11.8 (Paszke et al., 2019) and Python 3.10 on an RTX 4090 GPU. Experiments were conducted on the SegPC-2021 dataset, the ACDC dataset (Valvano et al., 2021), and the MSCMR dataset (Zhang & Zhuang, 2022). During training, input images were normalized to the range $[0, 1]$ and resized to $256 \times 256$ to fit the network input.

We use the dice score coefficient (DSC) and 95% Hausdorff Distance (HD95) as performance metrics. In our study, we employed five-fold cross-validation on the SegPC-2021 dataset and the ACDC dataset, calculating the mean evaluation metrics for each fold to ensure the robustness and generalizability of the results. Our ACDC dataset partitioning follows the same approach as Luo et al. (2022). Given that the MSCMR dataset has scribble annotations for only 25 scans, it is impractical to perform five-fold cross-validation on all 45 scans, and using just 25 scans would result in a dataset that is too small. Therefore, we adhered to the dataset split method from Zhang and Zhuang (2022), using 25 scans for the training set, five scans for the validation set, and 15 scans for the test set.

To minimize the joint loss functions for model optimization, we employed an SGD optimizer with a weight decay of $10^{- 4}$ , a momentum of 0.9, and an initial learning rate of 0.01. Additionally, we dynamically adjusted the learning rate using a polynomial learning rate strategy (Luo et al., 2020) and set the batch size to 16. We trained for 9,000 iterations on the SegPC-2021 dataset, which was sufficient for all compared methods, including ours, to converge. To match the settings in Luo et al. (2022), we trained for 30,000 iterations on the ACDC. Additionally, in accordance with the study by Zhang and Zhuang (2022), we trained for 300 epochs on the MSCMR and set the batch size to 12.

4.3. Comparison With Other State-of-the-art Methods on the ACDC Dataset and MSCMR Dataset

To demonstrate the performance of our method, we compared CMPLS with various state-of-the-art (SOTA) methods. Our comparison methods include:

(1)
Different scribble supervision loss function: pCE only (pCE) (Lin et al., 2016), uncertainty-aware self-ensembling and transformation-consistent model (USTM) (Liu et al., 2021), Mumford–Shah loss (MLoss) (Kim & Ye, 2019), GCRF lossGCRF (Obukhov et al., 2019), entropy minimization (EM) (Grandvalet & Bengio, 2004), and ZScribbleSeg (Zhang & Zhuang, 2023).
(2)
Pseudo label generation methods: Using pseudo labels generated by random walker (RW) (Grady, 2006), Scribble2Label (S2L; Lee & Jeong, 2020), spatial–spectral mutual teaching and ensemble learning (S2ME; Wang et al., 2023a), DMPLS (Luo et al., 2022), Scribble walking and class-wise contrastive regularization (SC-Net; Zhou et al., 2023). Scribble with Vision-Class Embedding (ScribbleVC; Li et al., 2023), ScribbleMatch and reliable-guided pixel alignment (SRPA; Liu et al., 2023), and ScribFormer (Li et al., 2024).
(3)
Mix augmentation methods: CycleMix (Zhang & Zhuang, 2022), TriMix (Zheng et al., 2022), and PCLMix (Lei et al., 2024).
Finally, we also studied the fully supervised method U-Net (Ronneberger et al., 2015).

On the MSCMR dataset, which primarily focuses on multi-sequence cardiac magnetic resonance (MR) images, our method consistently outperforms existing scribble-supervised segmentation approaches across all evaluation metrics. Notably, CMPLS achieves an average dice score improvement of 1.1% over the prior SOTA SRPA (88.5% vs. 87.4%). This performance gain is particularly evident in the segmentation of the Myo and RV, where precise boundary delineation is critical due to their complex and variable anatomical shapes. The superior performance in these categories demonstrates the effectiveness of our contour-aware pseudo-label strategy in capturing subtle structural variations and enhancing shape representation, which are typically underrepresented in weakly supervised settings.

In contrast, on the ACDC dataset, although CMPLS shows a slightly lower average dice score compared to TriMix (0.5% lower) and PCLMix (0.4% lower), it significantly outperforms both methods in terms of boundary accuracy, as indicated by the HD95 metric. Specifically, CMPLS reduces HD95 by 3.0 mm (from 5.9 mm to 2.9 mm) and 1.2 mm (from 4.1 mm to 2.9 mm) compared to TriMix and PCLMix, respectively. These results underscore the advantage of our method in promoting boundary precision, which is crucial for accurate structure delineation, especially in LV and Myo segmentation tasks—both of which involve complex edge geometries and thin-wall structures.

The discrepancy between dice and HD95 performance suggests that while other methods may achieve comparable volumetric overlap, they often struggle to accurately capture fine-grained boundary details, leading to larger localization errors. In contrast, CMPLS effectively leverages the contour features extracted by the CALS branch and integrates them through a dynamic contour-aware pseudo-labeling mechanism. This allows the network to learn more structure-sensitive features, thus maintaining high accuracy not only in the segmentation area but also in shape fidelity.

These findings highlight the robustness and generalization capability of our method across datasets with different characteristics—MSCMR with heterogeneous MR modalities, and ACDC with high-resolution cine MR images—as well as across structures with varying morphology and complexity. The results validate that contour-guided supervision, when effectively integrated, plays a pivotal role in improving segmentation quality under weak supervision scenarios.

Regarding fully supervised methods, the second part of Table 1 shows that the gap between weakly supervised and fully supervised methods is narrowing on the ACDC dataset. In Table 2, the second part indicates that our method significantly outperforms the basic fully supervised method on the MSCMR dataset. This showcases the great potential of CMPLS in medical image segmentation.

Table 1.
In the Experiments Conducted on the ACDC Dataset, We Compared Our Proposed Method With Other SOTA Methods in the Field of Weakly Supervised Learning.

Mean LV Myo RV

Methods Data Dice $↑$ HD95 $↓$ Dice $↑$ HD95 $↓$ Dice $↑$ HD95 $↓$ Dice $↑$ HD95 $↓$

S2ME (Wang et al., 2023a) Scribble 0.669 173.3 0.777 167.7 0.582 187.2 0.650 165.1

RW (Grady, 2006) Scribble 0.686 10.0 0.766 9.2 0.625 11.1 0.688 9.8

pCE (Lin et al., 2016) Scribble 0.788 39.0 0.844 43.8 0.813 25.8 0.708 47.4

USTM (Liu et al., 2021) Scribble 0.789 27.7 0.815 37.9 0.785 17.1 0.756 28.2

S2L (Lee & Jeong, 2020) Scribble 0.832 102.2 0.856 139.6 0.833 54.7 0.806 112.2

MLoss (Kim & Ye, 2019) Scribble 0.839 38.9 0.876 65.2 0.809 14.6 0.832 37.1

ScribbleVC (Li et al., 2023) Scribble 0.844 6.9 0.871 7.0 0.817 7.9 0.843 6.0

EM (Grandvalet & Bengio, 2004) Scribble 0.846 112.2 0.887 103.0 0.839 121.0 0.812 112.7

GCRF (Obukhov et al., 2019) Scribble 0.856 9.9 0.896 7.0 0.856 7.9 0.817 9.7

ScribFormer (Li et al., 2024) Scribble 0.861 $-$ 0.904 $-$ 0.851 $-$ 0.830 $-$

DMPLS (Luo et al., 2022) Scribble 0.872 9.9 0.913 12.1 0.861 7.9 0.842 9.7

ZScribbleSeg (Zhang & Zhuang, 2023) Scribble 0.862 9.79 0.900 7.69 0.825 8.93 0.862 12.74

SC-Net (Zhou et al., 2023) Scribble 0.872 6.5 0.915 8.1 0.839 6.7 0.862 4.6

TriMix (Zheng et al., 2022) Scribble 0.888 5.9 0.923 4.4 0.864 4.3 0.877 8.9

PCLMix (Lei et al., 2024) Scribble 0.887 4.1 0.922 3.7 0.856 3.3 0.882 5.3

Ours Scribble 0.883 2.9 0.924 2.7 0.866 2.7 0.860 3.2

FullSup (Ronneberger et al., 2015) Mask 0.898 7.0 0.930 8.1 0.883 5.9 0.882 6.9

Note. The upper bounds are highlighted in blue, while the best results obtained using scribble-based methods are indicated in bold. In the notation of the metrics, $↑$ indicates that a higher value is better, while $↓$ signifies that a lower value is preferable. SOTA = state-of-the-art; LV = left ventricle; Myo = myocardium; RV = right ventricle; HD95=95% Hausdorff distance; S2ME = spatial–spectral mutual teaching and ensemble learning; RW = random walker; pCE = partial cross-entropy; USTM = uncertainty-aware self-ensembling and transformation-consistent model; S2L = Scribble2Label; MLoss = Mumford–Shah loss; ScribbleVC = scribble with vision-class embedding; EM = entropy minimization; GCRF = gated conditional random field; DMPLS = dynamically mixed pseudo label supervision.

Table 2.
In the Experiments Conducted on the MSCMR Dataset, We Compared Our Proposed Method With Other SOTA Methods in the Field of Weakly Supervised Learning.

Methods Data Mean LV Myo RV

pCE (Lin et al., 2016) Scribble 0.515 0.597 0.373 0.574

USTM (Liu et al., 2021) Scribble 0.662 0.711 0.547 0.728

S2L (Lee & Jeong, 2020) Scribble 0.766 0.842 0.731 0.842

ScribbleVC (Li et al., 2023) Scribble 0.868 0.921 0.830 0.852

EM (Grandvalet & Bengio, 2004) Scribble 0.843 0.870 0.805 0.855

GCRF (Obukhov et al., 2019) Scribble 0.840 0.868 0.800 0.852

ScribFormer (Li et al., 2024) Scribble 0.839 0.896 0.813 0.807

DMPLS (Luo et al., 2022) Scribble 0.796 0.881 0.644 0.863

CycleMix (Zhang & Zhuang, 2022) Scribble 0.800 0.870 0.739 0.731

ZScirbbleSeg (Zhang & Zhuang, 2023) Scribble 0.870 0.922 0.834 0.854

SRPA (Liu et al., 2023) Scribble 0.874 0.912 0.827 0.883

Ours Scribble 0.885 0.921 0.843 0.892

FullSup (Ronneberger et al., 2015) Mask 0.770 0.850 0.721 0.738

Note. The best results obtained using scribble-based methods are highlighted in bold. LV = left ventricle; Myo = myocardium; RV = right ventricle; pCE = partial cross-entropy; USTM = uncertainty-aware self-ensembling and transformation-consistent model; S2L = Scribble2Label; ScribbleVC = scribble with vision-class embedding; EM = entropy minimization; GCRF = gated conditional random field; DMPLS = dynamically mixed pseudo label supervision; SRPA = ScribbleMatch and reliable-guided pixel alignment.

Furthermore, comparing the S2ME method in Tables 1 and 3, we find that S2ME performs significantly better on the SegPC-2021 dataset (ranking fifth) than on the ACDC dataset (ranking last). We attribute this to the fact that the SegPC-2021 dataset consists of color images, while the ACDC dataset contains grayscale images. The S2ME method is based on spatial–spectral pseudo-labels generation, whereas our method demonstrates strong robustness and generalization capabilities.

Table 3.
Experimental Results Comparing Our Method With Other Methods on the SegPC-2021 Dataset, the Upper Limits are Highlighted in Blue, and the Best Results Obtained Using the Scribble Method are Highlighted in Bold.

Methods Data Mean Nucleus Cytoplasm

RW (Grady, 2006) Scribble 0.795 0.880 0.71

pCE (Lin et al., 2016) Scribble 0.839 0.91 0.761

EM (Grandvalet & Bengio, 2004) Scribble 0.846 0.92 0.766

MLoss (Kim & Ye, 2019) Scribble 0.794 0.923 0.666

USTM (Liu et al., 2021) Scribble 0.844 0.923 0.765

S2L (Lee & Jeong, 2020) Scribble 0.849 0.925 0.773

GCRF (Obukhov et al., 2019) Scribble 0.852 0.933 0.772

S2ME (Wang et al., 2023a) Scribble 0.840 0.91 0.764

DMPLS (Luo et al., 2022) Scribble 0.848 0.928 0.768

ScribbleVC (Li et al., 2023) Scribble 0.85 0.932 0.772

ScribFormer (Li et al., 2024) Scribble 0.855 0.929 0.781

Ours Scribble 0.864 0.930 0.798

FullSup (Ronneberger et al., 2015) Mask 0.950 0.983 0.917

Note. RW = random walker; pCE = partial cross-entropy; EM = entropy minimization; MLoss= Mumford–Shah loss; USTM = uncertainty-aware self-ensembling and transformation-consistent model; S2L = Scribble2Label; GCRF = gated conditional random field; S2ME = spatial–spectral mutual teaching and ensemble learning; DMPLS = dynamically mixed pseudo label supervision; ScribbleVC = scribble with vision-class embedding.

Figure 4 shows a visual comparison of our method with other weakly supervised methods on the ACDC scribble dataset. As shown in Figure 4, our method exhibits significant advantages over other weakly supervised methods on the ACDC scribble dataset. First, in cases where the target region is surrounded by a very similar region (as shown in the first row of Figure 4), our method is able to accurately segment Myo and LV with even greater precision. in cases where the target region occupies a very small proportion of the graph (as shown in the second row of Figure 4), our method is able to segment the target region very accurately without interference from other regions. Finally, in cases where the imaging around the target region is blurred (as shown in the third and fourth rows of Figure 4), our method is still able to locate and segment the target region more accurately. Our method performs well in medical image segmentation, with the highest boundary clarity and accuracy in the segmentation results. Although the comparison methods perform better in some scenes, overall, they are slightly inferior to our method in terms of boundary accuracy and detail retention.

Figure 4.
Intuitive Comparison of our Proposed Method With Other Weakly Supervised Methods on the ACDC Dataset. (a) Raw Image, (b) Ground Truth, (c) Our Method, (d) pCE, (e) MLoss, (f) USTM, (g) EM, (g) S2L, and (i) DMPLS. Apparently, Our Method can Better Identify the Cell Boundaries. The Prediction Result (c) is More Consistent With the Ground Truth Image. Note. pCE = Partial Cross-Entropy; MLoss = Mumford–Shah Loss; USTM = Uncertainty-Aware Self-Ensembling and Transformation-Consistent Model; EM = Entropy Minimization; S2L = Scribble2Label; DMPLS = Dynamically Mixed Pseudo-Label Supervision.
4.4. Comparison With Other Methods on the SegPC2021 Dataset

		Mean	LV	Myo	RV
S2ME (Wang et al., 2023a)	Scribble	0.669	173.3	0.777	167.7	0.582	187.2	0.650	165.1
RW (Grady, 2006)	Scribble	0.686	10.0	0.766	9.2	0.625	11.1	0.688	9.8
pCE (Lin et al., 2016)	Scribble	0.788	39.0	0.844	43.8	0.813	25.8	0.708	47.4
USTM (Liu et al., 2021)	Scribble	0.789	27.7	0.815	37.9	0.785	17.1	0.756	28.2
S2L (Lee & Jeong, 2020)	Scribble	0.832	102.2	0.856	139.6	0.833	54.7	0.806	112.2
MLoss (Kim & Ye, 2019)	Scribble	0.839	38.9	0.876	65.2	0.809	14.6	0.832	37.1
ScribbleVC (Li et al., 2023)	Scribble	0.844	6.9	0.871	7.0	0.817	7.9	0.843	6.0
EM (Grandvalet & Bengio, 2004)	Scribble	0.846	112.2	0.887	103.0	0.839	121.0	0.812	112.7
GCRF (Obukhov et al., 2019)	Scribble	0.856	9.9	0.896	7.0	0.856	7.9	0.817	9.7
ScribFormer (Li et al., 2024)	Scribble	0.861	$-$	0.904	$-$	0.851	$-$	0.830	$-$
DMPLS (Luo et al., 2022)	Scribble	0.872	9.9	0.913	12.1	0.861	7.9	0.842	9.7
ZScribbleSeg (Zhang & Zhuang, 2023)	Scribble	0.862	9.79	0.900	7.69	0.825	8.93	0.862	12.74
SC-Net (Zhou et al., 2023)	Scribble	0.872	6.5	0.915	8.1	0.839	6.7	0.862	4.6
TriMix (Zheng et al., 2022)	Scribble	0.888	5.9	0.923	4.4	0.864	4.3	0.877	8.9
PCLMix (Lei et al., 2024)	Scribble	0.887	4.1	0.922	3.7	0.856	3.3	0.882	5.3
Ours	Scribble	0.883	2.9	0.924	2.7	0.866	2.7	0.860	3.2
FullSup (Ronneberger et al., 2015)	Mask	0.898	7.0	0.930	8.1	0.883	5.9	0.882	6.9

Methods	Data	Mean	LV	Myo	RV
pCE (Lin et al., 2016)	Scribble	0.515	0.597	0.373	0.574
USTM (Liu et al., 2021)	Scribble	0.662	0.711	0.547	0.728
S2L (Lee & Jeong, 2020)	Scribble	0.766	0.842	0.731	0.842
ScribbleVC (Li et al., 2023)	Scribble	0.868	0.921	0.830	0.852
EM (Grandvalet & Bengio, 2004)	Scribble	0.843	0.870	0.805	0.855
GCRF (Obukhov et al., 2019)	Scribble	0.840	0.868	0.800	0.852
ScribFormer (Li et al., 2024)	Scribble	0.839	0.896	0.813	0.807
DMPLS (Luo et al., 2022)	Scribble	0.796	0.881	0.644	0.863
CycleMix (Zhang & Zhuang, 2022)	Scribble	0.800	0.870	0.739	0.731
ZScirbbleSeg (Zhang & Zhuang, 2023)	Scribble	0.870	0.922	0.834	0.854
SRPA (Liu et al., 2023)	Scribble	0.874	0.912	0.827	0.883
Ours	Scribble	0.885	0.921	0.843	0.892
FullSup (Ronneberger et al., 2015)	Mask	0.770	0.850	0.721	0.738

Methods	Data	Mean	Nucleus	Cytoplasm
RW (Grady, 2006)	Scribble	0.795	0.880	0.71
pCE (Lin et al., 2016)	Scribble	0.839	0.91	0.761
EM (Grandvalet & Bengio, 2004)	Scribble	0.846	0.92	0.766
MLoss (Kim & Ye, 2019)	Scribble	0.794	0.923	0.666
USTM (Liu et al., 2021)	Scribble	0.844	0.923	0.765
S2L (Lee & Jeong, 2020)	Scribble	0.849	0.925	0.773
GCRF (Obukhov et al., 2019)	Scribble	0.852	0.933	0.772
S2ME (Wang et al., 2023a)	Scribble	0.840	0.91	0.764
DMPLS (Luo et al., 2022)	Scribble	0.848	0.928	0.768
ScribbleVC (Li et al., 2023)	Scribble	0.85	0.932	0.772
ScribFormer (Li et al., 2024)	Scribble	0.855	0.929	0.781
Ours	Scribble	0.864	0.930	0.798
FullSup (Ronneberger et al., 2015)	Mask	0.950	0.983	0.917

We compared CMPLS with other methods on the SegPC-2021 dataset, as shown in Table 3. In terms of scribble supervision, we found that CMPLS significantly outperforms other weak supervision methods in mean, nucleus, and cytoplasm metrics. Compared to the SOTA methods on other datasets, DMPLS and ScribbleVC, CMPLS surpasses them by 1.6% (84.8% vs. 86.4%) and 1.4% (85.0% vs. 86.4%), respectively. As for ScribFormer, which is the best performer on the SegPC-2021 dataset, our method surpasses it by 0.9% (85.5% vs. 86.4%).

Figure 5 shows a visual comparison of our method with other weakly supervised methods on the SegPC-2021 dataset. As shown, our method has significant advantages over other weakly supervised methods. First, in the case where the cytoplasm is fused with the background (second and third row of Figure 5), our method can distinguish the cytoplasm from the background more efficiently and segment the cytoplasm more accurately. Second, our method also performs well in dealing with color shifts due to overlapping (Figure 5 fourth row). Finally, in cases where the nucleus and cytoplasm are very close to each other (the first and fifth rows of Figure 5), our method achieves more accurate cell segmentation. Meanwhile, we can find that our method works extremely well in inscribing the boundary between the nucleus and the cytoplasm, while other methods tend to have many errors at the boundary.

Figure 5.

Intuitive Comparison of Our Proposed Method With Other Weakly Supervised Methods on the SegPC-2021 Dataset. (a) Raw Image, (b) Ground Truth, (c) Our Method, (d) RW, (e) pCE, (f) USTM, (g) EM, (g) S2L, (i) DMPLS, and (j) U-Net. Where (j) U-Net is the FullSup in Table 3. Apparently, Our Method can Better Identify the Cell Boundaries. The Prediction Result (c) is More Consistent with the Ground truth Image. Note. RW = Random Walker; pCE = Partial Cross-Entropy; USTM = Uncertainty-Aware Self-Ensembling and Transformation-Consistent Model; EM = Entropy Minimization; S2L = Scribble2Label; DMPLS = Dynamically mixe Pseudo Label Supervision.

However, when compared to the basic fully supervised model U-Net, we observed that weak supervision methods perform considerably worse on the SegPC-2021 dataset, especially in cytoplasm segmentation. We attribute this to the fact that, compared to MRI datasets, cells occupy larger areas in the images, involving more pixels in the region of interest, and the cell dataset has complex and highly irregular boundaries. Additionally, the staining of the cytoplasm is lighter than that of the nucleus. These factors contribute to the suboptimal performance of all scribble supervision methods on this dataset. Nevertheless, considering the enormous annotation cost associated with fully supervised methods (as illustrated in Figure 3(a), where full annotation requires substantial effort, especially for the cytoplasm boundary), our method achieves commendable performance with minimal annotation cost. This demonstrates the great potential of weak supervision methods in medical image segmentation.

4.5. Ablation Experiments

To evaluate the effectiveness of contour pseudo-labels supervision (CALS) and CA, we performed ablation experiments on the SegPC-2021 dataset and the ACDC dataset. As shown in Table 4, the first baseline (“basic”) corresponds to the removal of CALS and CA from our method. The second baseline (“basic + CALS w/o ABL”) is equivalent to adding CALS to the first baseline, but removing the active boundary loss from the loss function. The third baseline (“basic + CALS”) is equivalent to adding the active boundary loss to the second baseline. Our method is equivalent to adding CA to the third baseline. The results in Table 4 show that the performance of the model improves with the successive addition of each module, effectively demonstrating the efficacy of CALS and CA. Compared to the base model, the addition of CALS and CA increased the mean DSC metrics by 1.6% on the SegPC-2021 dataset and 1.2% on the ACDC dataset.

Table 4.
Ablation Experiments.

Methods SegPC-2021 Mean DSC ACDC Mean DSC

Basic 0.848 0.872

Basic $+$ CALS w/o ABL 0.852 0.875

Basic $+$ CALS 0.856 0.878

Ours (basic $+$ CALS $+$ CA) 0.864 0.883

Methods	SegPC-2021 Mean DSC	ACDC Mean DSC
Basic	0.848	0.872
Basic $+$ CALS w/o ABL	0.852	0.875
Basic $+$ CALS	0.856	0.878
Ours (basic $+$ CALS $+$ CA)	0.864	0.883

Note. DSC = dice score coefficient; CALS = contour auxiliary labels supervision; ABL = active boundary loss; CA = contour attention.

4.6. Sensitivity Experiments

In order to evaluate the impact of the weight $λ_{3}$ of $L_{EDGE}$ in the loss function on our model, we conducted sensitivity experiments on the SegPC-2021 dataset and the ACDC dataset. Here, we present only the average DSC scores for two categories in the SegPC-2021 dataset and the average DSC scores for three categories in the ACDC dataset.

4.6.1. Sensitivity Analysis of $λ_{3}$

In our proposed network, “contour pseudo label” plays a crucial role; $λ_{3}$ is the weight of contour loss that controls the use of pseudo label during network training. To achieve better results, we tried different $λ_{3}$ values for experiments on the SegPC-2021 dataset and the ACDC dataset. We studied the changes in the segmentation results of mean DSC when $λ_{3}$ is set to 0.01, 0.05, 0.1, 0.5, and 1.0. Figure 6 demonstrates these variations; all of these results were based on the five-fold cross-validation. It can be seen that when $λ_{3}$ is increased from 0.01 to 0.5, the model performs better, but when $λ_{3}$ is increased to 1.0, the segmentation results decrease slightly. The results show that the best performance is achieved when $λ_{3}$ is set to 0.5. Also, the results illustrate that our method is not sensitive to $λ_{3}$ .

Figure 6.

Sensitivity Analysis of Hyper-Parameter $λ_{3}$ .

4.6.2. Sensitivity Analysis of Contour Pseudo Label

The quality of the contour pseudo-label is crucial in our network. The Canny algorithm (Canny, 1986) has upper and lower thresholds, and we perform an inflation operation on the edges obtained by the Canny algorithm, implemented using the CV2.dilate function. Thus, we have three adjustable parameters to control the generated contour pseudo-label.

In order to obtain better results, we tried different parameter settings, using a triad combination of the low threshold, high threshold, and coefficient of expansion (denoted as L-U-Coe) for the SegPC-2021 dataset and the ACDC dataset. We investigated the changes in segmentation results when L-U-Coe took different values, and the results are shown in Figure 7. The results show that the best performance is achieved when the L-U-Coe value is set to 30-50-1. In addition, although anomalous changes were observed in both datasets at values of 40-60-1 and 50-60-2, respectively, our method was generally insensitive to changes in L-U-Coe values.

Figure 7.

Sensitivity Analysis of Contour Pseudo Label.

4.7. Pseudo-Labels Comparisons Experiments

In order to verify the higher quality of pseudo-labels generated by our method, we performed pseudo-label comparison experiments on the SegPC-2021 datasets as Figure 8 and Table 5.

Figure 8.

Intuitive Comparison of Pseudo Label, Where “Image” is the Raw Image, “GT” is the Real Label, “our” is our Model Generated Pseudo Labels, and “BaseLine” Removes CA and CALS Generated Pseudo Label From our Model. Note. CA = Contour Attention; CALS = Contour Auxiliary labels supervision.

Table 5.

Pseudo Labels Comparisons, The First Column Presents the DSC Results Calculated Between the Pseudo Label Generated by our Model Without CA and CALS and the Ground Truth Labels.

Methods	Baseline	Our
DSC	0.851	0.867

Note. The second column shows the DSC results calculated between the pseudo-label generated by our model and the ground truth labels. DSC = dice score coefficient; ABL = active boundary loss; CALS = contour auxiliary labels supervision; CA = contour attention.

Analyzing Figure 8 and Table 5, we can conclude that our model is closer to the ground truth labels compared to the baseline. This finding is crucial for guiding accurate segmentation of the nucleus and cytoplasm.

5. Conclusion

In this paper, we design a contour-aware scribble-supervised method for medical image segmentation. Our main idea is to enhance the model’s ability to accurately recognize cell contours and to compensate for the inherent limitations of scribble annotations in providing explicit contour information. We propose a contour pseudo-labels supervision module that uses contour pseudo-labels to guide the model in learning contour representations. We also propose the CA mechanism, which focuses the model’s learning on key contour regions, thereby ensuring more accurate segmentation results, especially at cell boundaries. Finally, we constructed an MM plasma cell scribble dataset based on the SegPC-2021 dataset and evaluated CMPLS on this dataset as well as two benchmark datasets. The results show that our method outperforms other weakly supervised methods based on scribble annotations.

In the future, we will explore comprehensive improvements to enhance the performance of scribble supervision methods on the SegPC-2021 dataset.

Footnotes

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Chongqing Natural Science Foundation Innovation and Development Joint Fund (CSTB2023NSCQ-LZX0109), Chongqing Technology Innovation & Application Development Key Project (cstb2022tiad-kpx0148), and Fundamental Research Funds for the Central Universities (No. 2022CDJYGRH-001).

Declaration of Competing Interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

ORCID iDs

Chenliang Wang

Longrong Ran

References

Arbeláez

Maire

Fowlkes

C. C.

Malik

(2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 898–916. https://api.semanticscholar.org/CorpusID:206764694

Bernard

Lalande

Zotti

Cervenansky

Yang

Heng

P. A.

Cetin

Lekadir

Camara

Gonzalez Ballester

M. A.

Sanroma

Napel

Petersen

Tziritas

Grinias

Khened

Kollerathu

V. A.

Krishnamurthi

Rohé

M. M.

Jodoin

P. M.

(2018). Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: Is the problem solved? IEEE Transactions on Medical Imaging, 37(11), 2514–2525. https://doi.org/10.1109/TMI.2018.2837502

Can

Y. B.

Chaitanya

Mustafa

Koch

L. M.

Konukoglu

Baumgartner

C. F.

(2018). Learning to segment medical images with scribble-supervision alone. In DLMIA/ML-CDS@MICCAI. https://api.semanticscholar.org/CorpusID:49668437

Canny

J. F.

(1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8, 679–698. https://api.semanticscholar.org/CorpusID:13284142

Chaitanya

Karani

Baumgartner

C. F.

Donati

O. F.

Becker

A. S.

Konukoglu

(2019). Semi-supervised and task-driven data augmentation. Information Processing in Medical Imaging, 29–41. https://api.semanticscholar.org/CorpusID:61153373

Chen

Yuan

Zeng

Wang

(2021, June 20–25). Semi-supervised semantic segmentation with cross pseudo supervision. In 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, TN, USA (pp. 2613–2622). IEEE. https://doi.org/10.1109/CVPR46437.2021.00264

Chen

Sun

(2023). Weakly-supervised semantic segmentation with image-level labels: From traditional models to foundation models. ArXiv abs/2310.13026. https://api.semanticscholar.org/CorpusID:265936308

Chen

Zhang

Lei

(2020). Digging into pseudo label: A low-budget approach for semi-supervised semantic segmentation. IEEE Access, 8, 41830–41837. https://doi.org/10.1109/ACCESS.2020.2975022

Cheng

Parkhi

O. M.

Kirillov

(2021, June 18–24). Pointly-supervised instance segmentation. In 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), New Orleans, LA, USA (pp. 2607–2616). IEEE. https://api.semanticscholar.org/CorpusID:233219510

10.

Dai

Sun

(2015, December 7–13). BoxSup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In 2015 IEEE international conference on computer vision (ICCV), Santiago, Chile (pp. 1635–1643). IEEE. https://api.semanticscholar.org/CorpusID:1613420

11.

Filipiak

Tempczyk

Cygan

(2021). n-CPS: Generalising cross pseudo supervision to n networks for semi-supervised semantic segmentation. ArXiv abs/2112.07528. https://api.semanticscholar.org/CorpusID:245130922

12.

Grady

(2006). Random walks for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1768–1783. https://doi.org/10.1109/tpami.2006.233

13.

Grandvalet

Bengio

(2004). Semi-supervised learning by entropy minimization. In NIPS’04: Proceedings of the 18th international conference on neural information processing systems (pp. 529–536). MIT Press. https://api.semanticscholar.org/CorpusID:7890982.

14.

Gupta

Gehlot

Goswami

Motwani

Gupta

Faura

Á. G.

Štepec

Martinčič

Azad

Merhof

Bozorgpour

Azad

Sulaiman

Pandey

Gupta

Bhattacharya

Sinha

Agarwal

Qiu

(2023). SegPC-2021: A challenge & dataset on segmentation of multiple myeloma plasma cells from microscopic images. Medical Image Analysis, 83, 102677. https://doi.org/10.1016/j.media.2022.102677

15.

Huo

Xie

Yang

Zhou

W. G.

Tian

(2021, June 20–25). ATSO: Asynchronous teacher-student optimization for semi-supervised image segmentation. In 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, TN, USA (pp. 1235–1244). IEEE https://api.semanticscholar.org/CorpusID:235703243

16.

Jiang

Zhou

Lin

Y. M.

Chan

R. C.

Liu

Chen

(2022). Deep learning for computational cytology: A survey. Medical Image Analysis, 84, 102691. https://api.semanticscholar.org/CorpusID:246705827

17.

Kim

J. C.

(2019). Mumford–Shah loss functional for image segmentation with deep learning. IEEE Transactions on Image Processing, 29, 1856–1866. https://api.semanticscholar.org/CorpusID:202540510

18.

Lafferty

J. D.

McCallum

Pereira

(2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML’01: Proceedings of the eighteenth international conference on machine learning (pp. 282–289). Morgan Kaufmann Publishers Inc. https://api.semanticscholar.org/CorpusID:219683473

19.

Lee

Jeong

(2020). Scribble2label: Scribble-supervised cell segmentation via self-generating pseudo-labels with consistency. In A. L. Martel, P. Abolmaesumi, D. Stoyanov, D. Mateus, M. A. Zuluaga, S. K. Zhou, D. Racoceanu & L. Joskowicz (Eds.), Medical image computing and computer assisted intervention – MICCAI 2020 – 23rd international conference, Lima, Peru, October 4–8, 2020, proceedings, part I, lecture notes in computer science (Vol. 12261, pp. 14–23). Springer. https://doi.org/10.1007/978-3-030-59710-8_2

20.

Lei

Luo

Wang

Zhang

(2024). Pclmix: Weakly supervised medical image segmentation via pixel-level contrastive learning and dynamic mix augmentation. In D. Huang, Z. Si & J. Guo (Eds.), Advanced intelligent computing technology and applications – 20th international conference, ICIC 2024, Tianjin, China, August 5–8, 2024, proceedings, part VI, lecture notes in computer science (Vol. 14867, pp. 62–73). Springer. https://doi.org/10.1007/978-981-97-5597-4_6

21.

Zheng

Luo

Shan

Hong

(2023). ScribbleVC: Scribble-supervised medical image segmentation with vision-class embedding. In Proceedings of the 31st ACM international conference on multimedia (pp. 3384–3393). ACM. https://api.semanticscholar.org/CorpusID:260334644.

22.

Zheng

Shan

Yang

Wang

Zhang

Y. T.

Hong

Shen

(2024). Scribformer: Transformer makes CNN work better for scribble-based medical image segmentation. IEEE Transactions on Medical Imaging, 43, 2254–2265. https://api.semanticscholar.org/CorpusID:267412312

23.

Lin

C. S.

Wang

C. Y.

Wang

Y. C. F.

Chen

M. H.

(2024). Semples: Semantic prompt learning for weakly-supervised semantic segmentation. ArXiv abs/2401.11791. https://api.semanticscholar.org/CorpusID:267069289.

24.

Lin

Dai

Jia

Sun

(2016, June 27–30). ScribbleSup: Scribble-supervised convolutional networks for semantic segmentation. In 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA (pp. 3159–3167). IEEE. https://api.semanticscholar.org/CorpusID:3121011.

25.

Liu

Qiao

Shu

Gao

(2023, December 5–8). SRPA: Scribblematch and reliable-guided pixel alignment for scribble-supervised medical image segmentation. In 2023 IEEE international conference on bioinformatics and biomedicine (BIBM), Istanbul, Turkiye (pp. 1325–1330). IEEE. https://doi.org/10.1109/BIBM58861.2023.10385813

26.

Liu

Yuan

Gao

Wang

Tang

Shen

(2021). Weakly supervised segmentation of COVID-19 infection with scribble annotation on CT images. Pattern Recognition, 122, 108341. https://api.semanticscholar.org/CorpusID:237576690

27.

Luo

Liao

Zhai

Song

Wang

Zhang

(2022, September 18–22). Scribble-supervised medical image segmentation via dual-branch network and dynamically mixed pseudo labels supervision. In Medical image computing and computer assisted intervention - MICCAI 2022: 25th international conference, Singapore (pp. 528–538). Springer-Verlag. https://api.semanticscholar.org/CorpusID:247244927.

28.

Luo

Liao

Chen

Song

Chen

Zhang

Chen

Wang

Zhang

(2020). Efficient semi-supervised gross target volume of nasopharyngeal carcinoma segmentation via uncertainty rectified pyramid consistency. In Medical image computing and computer assisted intervention – MICCAI 2021: 24th international conference, Strasbourg, France (pp. 318–329). Springer-Verlag. https://api.semanticscholar.org/CorpusID:232110965.

29.

McEver

R. A.

Manjunath

B. S.

(2020). PCAMs: Weakly supervised semantic segmentation using point supervision. ArXiv abs/2007.05615. https://api.semanticscholar.org/CorpusID:220496354

30.

Obukhov

Georgoulis

Dai

Gool

L. V.

(2019). Gated CRF loss for weakly supervised semantic image segmentation. ArXiv abs/1906.04651. https://api.semanticscholar.org/CorpusID:184487437.

31.

Paszke

Gross

Massa

Lerer

Bradbury

Chanan

Killeen

Lin

Gimelshein

Antiga

Desmaison

Köpf

Yang

E. Z.

DeVito

Raison

Tejani

Chilamkurthy

Steiner

Fang

Chintala

(2019). Pytorch: An imperative style, high-performance deep learning library. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox & R. Garnett (Eds.), Advances in neural information processing systems 32: Annual conference on neural information processing systems 2019, NeurIPS 2019, December 8–14, 2019, Vancouver, BC, Canada (pp. 8024–8035). Curran Associates Inc. https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html.

32.

Qiu

Lei

Xie

Lei

(2022, March 28–31). Segmentation of multiple myeloma cells using feature selection pyramid network and semantic cascade mask RCNN. In 2022 IEEE 19th international symposium on biomedical imaging (ISBI), Kolkata, India (pp. 1–4). IEEE https://api.semanticscholar.org/CorpusID:248406944.

33.

Ren

Shen

You

(2024). Point-supervised semantic segmentation of natural scenes via hyperspectral imaging. In 2024 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), Seattle, WA, USA (pp. 1357–1367). IEEE.

34.

Ronneberger

Fischer

Brox

(2015). U-Net: Convolutional networks for biomedical image segmentation. In: Lecture notes in computer science (vol. 9351, pp.234-241). Springer.

35.

Sohn

Berthelot

Carlini

Zhang

Raffel

Cubuk

E. D.

Kurakin

(2020). Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan & H. Lin (Eds.), Advances in neural information processing systems 33: Annual conference on neural information processing systems 2020, NeurIPS 2020, December 6–12, 2020, virtual (Article No. 51, pp. 596–608). Curran Associates Inc. https://proceedings.neurips.cc/paper/2020/hash/06964dce9addb1c5cb5d6e3d9838f733-Abstract.html.

36.

Teh

E. W.

Devries

Duke

Jiang

Aarabi

Taylor

G. W.

(2021, May 31–June 2). The GIST and RIST of iterative self-training for semi-supervised segmentation. In 2022 19th conference on robots and vision (CRV), Toronto, ON, Canada (pp. 58–66). IEEE. https://api.semanticscholar.org/CorpusID:232428221.

37.

Valanarasu

J. M. J.

Oza

Hacihaliloglu

Patel

V. M.

(2021). Medical transformer: Gated axial-attention for medical image segmentation. In M. de Bruijne, P. C. Cattin, S. Cotin, N. Padoy, S. Speidel, Y. Zheng & C. Essert (Eds.), Medical image computing and computer assisted intervention – MICCAI 2021 – 24th international conference, Strasbourg, France, September 27–October 1, 2021, proceedings, part I, Lecture notes in computer science (Vol. 12901, pp. 36–46). Springer. https://doi.org/10.1007/978-3-030-87193-2_4

38.

Valvano

Leo

Tsaftaris

S. A.

(2021). Learning to segment from scribbles using multi-scale adversarial attention gates. IEEE Transactions on Medical Imaging, 40(8), 1990–2001. https://doi.org/10.1109/TMI.2021.3069634

39.

van de Sande

K. E. A.

Uijlings

J. R. R.

Gevers

Smeulders

A. W. M.

(2011a). Segmentation as selective search for object recognition. In 2011 international conference on computer vision, Barcelona, Spain (pp. 1879–1886). IEEE. https://doi.org/10.1109/ICCV.2011.6126456

40.

van de Sande

K. E. A.

Uijlings

J. R. R.

Gevers

Smeulders

A. W. M.

(2011b, November 6–13). Segmentation as selective search for object recognition. In 2011 international conference on computer vision, Barcelona, Spain (pp. 1879–1886). IEEE. https://doi.org/10.1109/ICCV.2011.6126456

41.

Wang

Zhang

Islam

Ren

(2023a). S

^{2}

me: Spatial–spectral mutual teaching and ensemble learning for scribble-supervised polyp segmentation. In H. Greenspan, A. Madabhushi, P. Mousavi, S. E. Salcudean, J. Duncan, T. F. Syeda-Mahmood & R. H. Taylor (Eds.), Medical image computing and computer assisted intervention – MICCAI 2023 – 26th international conference, Vancouver, BC, Canada, October 8–12, 2023, proceedings, part I, lecture notes in computer science (Vol. 14220, pp. 35–45). Springer. https://doi.org/10.1007/978-3-031-43907-0_4

42.

Wang

Zhang

Cui

Liu

Ren

Yang

Xie

Hua

Bao

(2021a). Active boundary loss for semantic segmentation. In AAAI conference on artificial intelligence (pp. 2397–2405). https://api.semanticscholar.org/CorpusID:231802302.

43.

Wang

Gao

Wang

Long

(2021b). Self-tuning for data-efficient deep learning https://api.semanticscholar.org/CorpusID:232046420.

44.

Wang

You

(2018). Weakly-supervised semantic segmentation by iteratively mining common object features. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 1354–1362). IEEE. https://api.semanticscholar.org/CorpusID:48362544.

45.

Wang

Zhang

Kan

Shan

Chen

(2023b). Blpseg: Balance the label preference in scribble-supervised semantic segmentation. IEEE Transactions on Image Processing, 32, 4921–4934. https://api.semanticscholar.org/CorpusID:261064235

46.

Wang

Zhao

Zhou

Xing

Kong

(2023c). Conflict-based cross-view consistency for semi-supervised semantic segmentation. In 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Vancouver, BC, Canada (pp. 19585–19595). IEEE. https://api.semanticscholar.org/CorpusID:257280464.

47.

Yang

Zhuo

Shi

Gao

(2021). ST++: Make self-training work better for semi-supervised semantic segmentation. In 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), New Orleans, LA, USA (pp. 4258–4267). IEEE. https://api.semanticscholar.org/CorpusID:235377324.

48.

Ying

Huang

Yang

Cheng

(2023). Weakly supervised segmentation of uterus by scribble labeling on endometrial cancer MR images. Computers in Biology and Medicine, 167, 107582. https://doi.org/10.1016/j.compbiomed.2023.107582

49.

Zhang

Zhuang

(2022, June 18–24). Cyclemix: A holistic strategy for medical image segmentation from scribble supervision. In 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), New Orleans, LA, USA (pp. 11646–11655). IEEE. https://api.semanticscholar.org/CorpusID:247222793.

50.

Zhang

Zhuang

(2023). Zscribbleseg: Zen and the art of scribble supervised medical image segmentation. ArXiv abs/2301.04882. https://api.semanticscholar.org/CorpusID:255749115.

51.

Zheng

Hayashi

Oda

Kitasaka

Mori

(2022, December 4–8). Trimix: A general framework for medical image segmentation from limited supervision. In Computer vision – ACCV 2022: 16th Asian conference on computer vision, Macao, China, proceedings, Pa VI (pp. 185-202). Springer-Verlag. https://api.semanticscholar.org/CorpusID:257285442

52.

Zhou

Tong

R. K. Y.

(2023, October 8–12). Weakly supervised medical image segmentation via superpixel-guided scribble walking and class-wise contrastive regularization. In Medical image computing and computer assisted intervention – MICCAI 2023: 26th international conference, Vancouver, BC, Canada, proceedings, part II (pp. 137–147). Springer-Verlag. https://api.semanticscholar.org/CorpusID:263673257.

53.

Zhu

Zhang

Manmatha

Smola

(2024). Improving semantic segmentation via efficient self-training. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(3), 1589–1602. https://doi.org/10.1109/TPAMI.2021.3138337

54.

Zhuang

(2016 October 17–21). Multivariate mixture model for cardiac segmentation from multi-sequence MRI. In Medical image computing and computer-assisted intervention – MICCAI 2016: 19th international conference, Athens, Greece, 2016, proceedings, part II (pp. 581–588). Springer-Verlag. https://api.semanticscholar.org/CorpusID:39950451.

55.

Zhuang

(2019). Multivariate mixture model for myocardial segmentation combining multi-source images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(12), 2933–2946. https://doi.org/10.1109/TPAMI.2018.2869576

		Mean		LV		Myo		RV
Methods	Data	Dice $↑$	HD95 $↓$	Dice $↑$	HD95 $↓$	Dice $↑$	HD95 $↓$	Dice $↑$	HD95 $↓$
S2ME (Wang et al., 2023a)	Scribble	0.669	173.3	0.777	167.7	0.582	187.2	0.650	165.1
RW (Grady, 2006)	Scribble	0.686	10.0	0.766	9.2	0.625	11.1	0.688	9.8
pCE (Lin et al., 2016)	Scribble	0.788	39.0	0.844	43.8	0.813	25.8	0.708	47.4
USTM (Liu et al., 2021)	Scribble	0.789	27.7	0.815	37.9	0.785	17.1	0.756	28.2
S2L (Lee & Jeong, 2020)	Scribble	0.832	102.2	0.856	139.6	0.833	54.7	0.806	112.2
MLoss (Kim & Ye, 2019)	Scribble	0.839	38.9	0.876	65.2	0.809	14.6	0.832	37.1
ScribbleVC (Li et al., 2023)	Scribble	0.844	6.9	0.871	7.0	0.817	7.9	0.843	6.0
EM (Grandvalet & Bengio, 2004)	Scribble	0.846	112.2	0.887	103.0	0.839	121.0	0.812	112.7
GCRF (Obukhov et al., 2019)	Scribble	0.856	9.9	0.896	7.0	0.856	7.9	0.817	9.7
ScribFormer (Li et al., 2024)	Scribble	0.861	$-$	0.904	$-$	0.851	$-$	0.830	$-$
DMPLS (Luo et al., 2022)	Scribble	0.872	9.9	0.913	12.1	0.861	7.9	0.842	9.7
ZScribbleSeg (Zhang & Zhuang, 2023)	Scribble	0.862	9.79	0.900	7.69	0.825	8.93	0.862	12.74
SC-Net (Zhou et al., 2023)	Scribble	0.872	6.5	0.915	8.1	0.839	6.7	0.862	4.6
TriMix (Zheng et al., 2022)	Scribble	0.888	5.9	0.923	4.4	0.864	4.3	0.877	8.9
PCLMix (Lei et al., 2024)	Scribble	0.887	4.1	0.922	3.7	0.856	3.3	0.882	5.3
Ours	Scribble	0.883	2.9	0.924	2.7	0.866	2.7	0.860	3.2
FullSup (Ronneberger et al., 2015)	Mask	0.898	7.0	0.930	8.1	0.883	5.9	0.882	6.9

Scribble-Supervised Medical Image Segmentation Using Contour Mixed Pseudo Labels Supervision and Contour Auxiliary Labels Supervision

Abstract

Keywords

1. Introduction

2. Related Work

2.1. Pseudo Labels for Semantic Segmentation

2.2. Scribble-Supervised Semantic Segmentation

3. Methodology

3.1. Overall Architecture

3.3.1. Supervision Mechanism

4.1. Datasets

4.3. Comparison With Other State-of-the-art Methods on the ACDC Dataset and MSCMR Dataset

Table 4. Ablation Experiments. Methods SegPC-2021 Mean DSC ACDC Mean DSC Basic 0.848 0.872 Basic + CALS w/o ABL 0.852 0.875 Basic + CALS 0.856 0.878 Ours (basic + CALS + CA) 0.864 0.883

4.6.1. Sensitivity Analysis of λ 3

Footnotes

Funding

Declaration of Competing Interest

ORCID iDs

References

Table 4.
Ablation Experiments.

Methods SegPC-2021 Mean DSC ACDC Mean DSC

Basic 0.848 0.872

Basic $+$ CALS w/o ABL 0.852 0.875

Basic $+$ CALS 0.856 0.878

Ours (basic $+$ CALS $+$ CA) 0.864 0.883

4.6.1. Sensitivity Analysis of $λ_{3}$