Sage Journals: Discover world-class research

Abstract

Fiber masterbatch production suffers from inherent agglomeration effect of high-concentration color masterbatches, negatively impacting color uniformity, particle dispersion, and thermal stability in fiber masterbatch, and ultimately the quality of fiber products. However, accurately recognizing and classifying the agglomeration in scanning electron microscopy (SEM) images remain a challenge due to the complexity of agglomeration status, difficulty in distinguishing micro-size agglomeration, and limited data availability. To address this challenge, this paper proposes a novel microstructure recognition architecture, named Micro-KTNet, which is designed for segmenting SEM images of fiber masterbatch aggregation. First, Micro-KTNet leverages transferable microstructural features to initialize the proposed network, mitigating the impact of limited data. Then, to handle multi-scale features in the microstructure, an encoding-decoding structure is constructed and skip connections are used to transfer the output features of the encoding layer. Finally, the decoder incorporates a novel attention module to effectively process both global and local features. To evaluate Micro-KTNet, a unique fiber masterbatch SEM database is built as the benchmark for micro-size agglomeration recognition. Experimental results indicate that Micro-KTNet surpasses existing state-of-the-art methods and improves the recognition precision from 75. 37% to 87. 04%.

Keywords

Fiber masterbatch agglomeration effects encoder-decoder framework transfer learning microstructure segmentation attention module

Introduction

Fiber masterbatch, a fundamental component within the chemical fiber industry, is a blend of different materials. It mainly comprises high-level thermoplastic polymers, micro-nano functional powders, and various additives. The complex industrial fabrication process of fiber masterbatch encompasses multiple stages such as mixing raw material, extruding the mixture, cooling, and quality inspections. However, During the formation of fiber masterbatches, the agglomeration status of particles is unavoidable due to charge forces between particles and the action of dispersants.¹ In the mixture formed by the raw materials, those particles, develop significant positive and negative charges on their surfaces. Then, particles of inconsistent sizes can accumulate and stick together due to the charge force between them. While inherent to the manufacturing process, particle agglomeration has a negative impact on the quality and performance of the final products.² Traditionally, researchers have primarily evaluated agglomeration effects by focusing on filtration efficiency.³ Additionally, industry experts in the fiber field can reply on their expertise to manually evaluate the agglomeration results through the analysis of the captured electron microscope images (SEM).⁴ However, due to the low efficiency and high labor intensity associated with human inspection, automated detection based on machines or computers has drawn considerable attention in recent decades.

At present, deep learning employs multi-layer neural networks to extract high-level abstract features from data and achieves remarkable success.⁵ It has significantly enhanced the performance of tasks such as image classification,⁶ object detection,⁷ and semantic segmentation,⁸ owing to its self-learning and training capabilities.⁹ In the field of fiber textiles, deep learning has shown remarkable efficacy in textile image classification,¹⁰ optimizing chemical fiber processes,¹¹ and online detection of fiber quality.¹² For instance, Wei et al.¹³ proposed an integrated learning framework called the bio-inspired visual integrated framework (BIVI-ML) for efficiently classifying multi-label textile defects. In this work, three bio-inspired visual mechanisms (the visual gain mechanism, the visual attention mechanism, and the visual memory mechanism) were proposed and built within the BIVI-ML. In addition, Tan et al.¹⁴ proposed a scale and context information fusion network for multi type medical image segmentation, called SCIF-Net. SCIF-Net showcasing its superior capability in effectively capturing multi-scale features and spatial information, thereby enabling accurate segmentation of diverse human tissue structures. Additionally, Wu et al.¹⁵ proposed a novel generative adversarial network (GAN)model, called SSGAN-ASP. SSGAN-ASP is designed to enhance fundus images by preserving anatomical structures and employing semi supervised learning, demonstrating improved visual quality and diagnostic accuracy. Moreover, Wei et al.¹⁶ proposed a novel detection model to tackle the challenge of identifying small-scale textile defects, called Faster VG-RCNN. By incorporating attention-related visual gain mechanisms into Faster RCNN, this approach promises to enhance the accuracy and reliability of defect detection in textile manufacturing processes. Du et al.¹⁷ applied an artificial intelligence technology into waste textile identification and sorting. Developing two online NIR qualitative identification models covering thirteen kinds of waste textiles using the convolutional neural network (CNN)and Baidu’s deep learning plat form PaddlePaddle. Furthermore, for the task of textile defect recognition with few labeled samples, a semi-supervised spatial-spectral neural network¹⁸ was proposed, which could capture both spectral and spatial feature information from textile images. Experimental results showed that the textile defect recognition model achieved outstanding performance in a series of textile defect recognition challenges. Meanwhile, in terms of identifying the internal microstructure of fiber masterbatch, scanning electron microscopy (SEM) imaging is a well-established methodology at different length scales, allowing for an in-depth structural investigation and physical properties evaluation.

In Current research on SEM microstructure recognition, several deep neural network models have demonstrated impressive performance. For example, Quan et al.¹⁹ pro posed a deep fully residual convolutional neural network for image segmentation in connect omics, called FusionNet. FusionNet introduces sum-based skip connections to deepen the network structure and enhance the segmentation accuracy across several electron microscopy segmentation datasets. Furthermore, Chen et al.²⁰ proposed a pioneering Transformer-based Attention Guided Network to advance automatic medical image segmentation, called TransAttUnet. Integrating self-aware attention modules and multi scale skip connections, TransAttUnet adeptly captures long range contextual dependencies, thereby enhancing semantical segmentation accuracy. Additionally, Wei et al.²¹ introduced a methodology utilizing deep learning feature pyramid networks (FPN)to tackle the complexities of multi-class fabric defect detection, addressing challenges such as intersecting defects and small-size anomalies. Meanwhile, Zhou et al.²² developed a physics-informed deep learning framework to predict the strength of composite materials with microstructural uncertainties using RVE images generated by a random fiber packing algorithm. The framework significantly accelerates uncertainty analysis and outperforms traditional methods, with its effectiveness validated through systematic case studies and statistical cross-validation. Besides, Zeng et al.²² proposed an improved neural network based on Unet for nuclei segmentation in histology images. Moreover, Zhang et al.²³ proposed a very deep residual channel attention network (RCAN)for electron microscopy image super-resolution (SR). However, these models share a common limitation: data dependency. They rely on extensive annotated datasets for training. Acquiring high-quality annotations for SEM images can be challenging and time-consuming in practice. This limitation restricts the generalization capability and robustness of these models. To address this issue, transfer learning methods are often employed. For instance, Tamarina²⁴ et al. proposed a transfer learning approach using Convolutional Neural Networks (CNNs) pre-trained on the ImageNet dataset. This method utilizes CNN layers as feature extractors to derive features from the ImageNet dataset, which are then passed to a support vector machine with a linear kernel. However, this method still faces some issues in practice. On one hand, there may be feature distribution differences and missing data labels between the source domain and the target domain. In such cases, it is necessary to identify and utilize the similarities between the source domain and the target domain to transfer knowledge from the source domain to the target domain. On the other hand, the knowledge learned in the source domain may negatively impact learning in the target domain.

In this paper, building upon the advancement of deep learning for SEM microstructure recognition, we explore the microstructure knowledge transfer learning-based agglomeration recognition network (Micro-KTNet), which has not been widely recognized and studied, but is of significant importance. The integration of several modules in the proposed model focuses on the dispersion superstructure and the action mechanism of parent particles. Furthermore, the proposed Micro-KTNet utilizes feature learning and complex modeling capabilities to segment aggregated particles, showcasing its recognition performance in fiber masterbatch industry process. The primary contributions of this paper are:

1. We propose a novel end-to-end framework for masterbatch agglomeration recognition with few labeled SEM images, which transforms the task of agglomeration recognition into a task based on microstructure knowledge transfer learning-based method.

2. We present a novel attention encoding-decoding structure, in which the encoder with skip connections is capable of producing multi-scale features and decoder with attention module can process both global and local features effectively.

3. We build a benchmark dataset for masterbatch agglomeration to evaluate the empirical performance of Micro-KTNet. Experimental results show that the proposed model achieves outstanding performance in masterbatch agglomeration recognition with few labeled SEM images.

The remainder of this paper is organized as follows. In Related works section, we briefly summarize the basic concept of transfer learning and encoder-decoder. Details of the proposed framework are presented in The proposed Micro-KTNe section. Performance comparisons between the proposed approach and current state-of-the-art approaches are described in Experimental evaluation and discussion section. The paper is concluded in Conclusions and future work section.”

Related works

Transfer learning

Transfer learning aims to improve the performance of a target task represented as $T_{T}$ in target domain $D_{T}$ , by leveraging knowledge from a source domain $D_{S}$ and a source task $T_{S}$ , where $D_{S} \neq D_{T}$ or $T_{S} \neq T_{T}$ .²⁵ In this context, domain $D$ is closely intertwined with the data distribution for training. It is characterized by a feature space $χ$ and a marginal probability distribution $P (X)$ , where $X = {x_{1}, . . ., x_{n}} \in χ$ . Each unique domain requires its own distinctive feature space or marginal probability distribution. The task $T$ is learned from training data $T {x i, y i}$ , where $x i \in X$ and $y i \in Y$ . The objective of transfer learning, particularly in deep transfer learning, is to enhance the function $f_{T} (\cdot)$ in the target task $T_{T}$ by incorporating knowledge from source tasks $T_{S}$ . Generally, these source tasks involve a larger source domain $D_{S}$ compared to the target domain $D_{T}$ . It is generally believed that the features extracted by the initial layers of neural networks are universal. Recently, transfer learning techniques were broadly classified into four types based on their applications: instance-based, mapping-based, network based, and adversarial-based.²⁶ Among these, the most prevalent approach in practical applications is network-based transfer learning, capitalizing on the inherent structure of networks to enhance performance and adaptability. Research²⁷ demonstrated that image representations learned by convolutional neural networks (CNN)using large-scale annotated datasets could effectively transfer to other visual recognition tasks with limited training data. As a result, many visual-related studies^28–30 commonly initialized parameters with models pre-trained on the large-scale ImageNet dataset and then fine-tuned or freeze specific parameters. However, in the task of ultrastructural image segmentation, Stuckner et al.³¹ employed a pre-trained model on the large-scale MicroNet microscopy dataset as the initial parameters for the image segmentation network. For the transfer of heterogeneous neural networks, an encoder-decoder network is a commonly used architecture.

Encoder-decoder

The encoder-decoder is an important network structure in deep learning, which comprises two components: the encoder and the decoder. The encoder is responsible for transforming the input into an intermediate state known as the feature, while the decoder decodes the feature to produce the desired output. As the encoder-decoder is a generic framework rather than a specific model, it has the flexibility to handle various types of data.^32–34 Meanwhile, the design of diverse models can be facilitated by the encoder-decoder architecture. For example, recurrent neural networks (RNN) can be regarded as encoder-decoder structures. The encoder converts input text into vectors, and the decoder decodes the vectors into the desired outputs. Consequently, the RNN encoder-decoder architecture finds its application in sequence-to-sequence tasks, where the input and output are sequences of varying lengths. This approach is commonly known as the encoder-decoder Seq2Seq model.³⁵ In this model, the RNN can be a simple RNN, a long short-term memory (LSTM), or a gated recurrent unit (GRU). For a simple RNN, each intermediate state, denoted by $H_{t}$ (encoder), is computed using the equation (1)

H_{t} (e n c o d e r) = φ (W_{H H} \times H_{t - 1} + W_{H X} \times X_{t})

(1)where φ represents the activation function,

W_{H H}

is the weight matrix for connecting intermediate states, and

W_{H X}

is the weight matrix for connecting the input and intermediate states. The hidden state of the decoder,

H_{t}

(decoder), is calculated as formula (2)

H_{t} (decoder) = φ (W_{H H} \times H_{t - 1})

(2)

The decoder output is given by formula (3)

Y_{t} = H_{t} (decoder) \times W_{H Y}

(3)where

W_{H Y}

is the weight matrix connecting the intermediate state and the decoder output. In addition to these examples, RNN-based encoder-decoder models have been utilized in various applications. For instance, Cho et al.³² proposed an RNN encoder-decoder model that employs an RNN as the encoder, encoding a symbolic sequence into a fixed-length vector representation. Another RNN was employed as the decoder to decode the representation into another symbolic sequence, thus achieving machine translation. Additionally, Wang et al.³² presented a generator that utilizes RNN to generate Chinese poetry. Furthermore, Serban et al.³² introduced a hierarchical latent variable encoder-decoder model for generating dialogue, which involves a two-step generation process incorporating the sampling of latent variables and the generation of output sequences using a random latent variable. This model can capture long-term context and model sequences with a hierarchical structure. In a variation, Park et al.³² replaced the recurrent neural network (RNN)with long short-term memory (LSTM)and proposed an encoder-decoder model based on LSTM. This model allows the sequence-to-sequence prediction of vehicle trajectories. Similarly, encoder-decoder architectures can also be applied to convolutional neural networks (CNN). In CNNs, images pass through convolutional layers, followed by linear layers, and eventually yield classification results. The convolutional layers are responsible for extracting features, while the linear layers are utilized for result prediction. Consequently, the feature extraction component can be regarded as the encoder, converting the original image into an intermediate representation suitable for machine learning, and the decoder transforms the intermediate representation into the desired output. The encoder-decoder architecture based on CNN is particularly useful for image processing tasks. For instance, Yasrab et al.³⁶ proposed a simplified CNN-based encoder-decoder model for semantic segmentation. The encoder network is based on the twelve convolutional layers of VGG-16, omitting pooling layers, while the decoder network employs upsampling and transpose convolutional units, culminating in a pixel-level classification layer. Additionally, Badrinarayanan et al.³⁷ introduced a practical fully convolutional neural network framework based on the encoder-decoder architecture. The encoding component has an analogous topological structure to VGG-16, while the decoder section utilizes a novel upsampling technique. Specifically, the decoder employs the pooling indices computed during the corresponding encoder’s max pooling steps to accomplish non-linear upsampling. Moreover, the proposed model, DeepLabv3+,³⁸ expands on DeepLabv3 by introducing a simple yet effective decoder module to refine segmentation results, particularly at object boundaries. Furthermore, Li et al.³⁹ achieved semantic segmentation tasks with minimal data by utilizing a CNN-based encoder-decoder model, enabling training with small sample sizes.

The proposed Micro-KTNet

System overview

In this study, we define the task as the recognition of the microstructure aggregation of fiber parent particles at the pixel level in SEM images. As depicted in Figure 1, our proposed framework comprises three main components: (a) transfer Learning for microstructure features (b) the SENetEncoder, and (c) the UnetDecoder.

Figure 1.

The proposed method, the Micro-KTNet Encoder-Decoder network, comprises three key components: transfer learning microstructure features, SENetEncoder, and UNetDecoder. The transferable microstructural features are used to initialize the proposed network. The SENetEncoder reuses convolutional layers trained on MicroNet to encode the input image into feature maps. The UNetDecoder is responsible for decoding these feature maps. Finally, the SegmentationHead processes the decoded feature maps to obtain per-pixel class probabilities, generating the segmentation mask.

The complete structure of Micro-KTNet, comprising its three key modules (the transfer learning microstructure features, the SENetEncoder module, and the UNetDecoder module), is illustrated, with detailed information regarding the training parameters and the learning process for the Micro-KTNet network provided in Section.

Traditionally, the weights of a neural network are initialized randomly. However, this may lead to overfitting problems that occur when the model fits too closely to the training data and fail to generalize effectively to new data. This challenge is more likely in the domain of fiber parent particle microstructure aggregation, where data is scarce and creating segmentation masks is costly. To address this, we adopt a transfer learning approach to initialize our network. Initially, a batch of pre-processed images, along with their corresponding ground truth, are fed into the model. The SENetEncoder is then employed to encode these inputs into feature maps with various resolutions, which contain valuable semantic information. Subsequently, the UnetDecoder is used to upsample these feature maps. The segmentation head is applied to the upsampled output to generate a segmentation mask. This mask contains class probabilities for each pixel, facilitating the identification of fiber parent particle microstructure aggregation. Eventually, the network delivers classification results for each pixel. Throughout the learning process, the segmentation network extracts feature information related to fiber parent particle microstructure aggregation from input images through forward propagation. The loss computation occurs during forward propagation, and network parameters are adjusted through backward propagation to minimize this loss, thereby optimizing the network’s performance in pixel-wise segmentation tasks. Ultimately, the output of the segmentation network precisely segments fiber parent particle microstructure aggregation, enabling automated analysis of specific structures within the images.

Transfer learning-based microstructure features

Collecting and manually creating segmentation masks can pose challenges and be costly, resulting in a relatively small self-made dataset. To mitigate the risk of overfitting, we utilize transfer learning to initialize our network. Initially, we train a ResNeXt50 network for classification tasks using a large microscope dataset. The objective is to learn a target function $f_{θ}$ on the given source domain $D_{s o u r c e}$ , by training a set of parameters $θ *$ according to the following formula (4)

θ^{*} = \arg \min_{θ} L (D_{source}; θ)

(4)where

θ *

represents the optimal model parameters.

θ

represents the parameters of the function, and L denotes the loss function. Previous studies have shown that the early layers of a network learn generic features, while the deeper layers focus more on specific features. By utilizing a pre-trained model on the large MicroNet microscope dataset,³⁹ we enable our model to learn to detect higher-level microscopic structural features. Therefore, we choose all the convolutional layers of the ResNeXt50 network for transfer learning. Let the parameters of the SENetEncoder part be θ₁, which is equivalent to θ. The parameters of the UnetDecoder part are denoted as θ₂. In the target domain

D_{t a r g e t}

, we learn a target function

f_{θ}

by training a set of parameters

(θ_{1} {, θ}_{2})

as per the following equation (5)

θ^{*} = \arg \min_{(θ_{1}, θ_{2})} L ((θ_{1}, θ_{2}), D_{target})

(5)where

θ *

represents the optimal model parameters, and θ₂ is randomly initialized parameters. The loss function

L (\cdot)

used consists of a weighted combination of the BCEloss (Binary Cross-Entropy) and the Dice loss. Let the predicted binary segmentation mask be P and the true binary segmentation mask be G. The BCEloss (Binary Cross-Entropy) is defined as formula (6)

B C E L o s s (P, G, θ) = - \frac{1}{N} \sum_{i = 1} [G_{i} \times \log (f_{i | θ}) + (1 - G_{i}) \times \log (1 - f_{i | θ})]

(6)where N represents the number of pixels, G_i is the value of the

i - t h

pixel in the true binary segmentation mask (0 or 1), and

f_{i | θ}

is the corresponding predicted probability. The Dice loss is defined as formula (7)

DiceLoss (P, G, θ) = 1 - \frac{2 \sum N_{i = 1} P_{i} \times G_{i}}{\sum N_{i = 1} P_{i}^{2} + \sum N_{i = 1} G_{i}^{2}}

(7)

The overall loss function of transfer learning is defined as formula (8)

Loss (θ) = \partial \times BCELoss (P, G, θ) + (1 - \partial) \times DiceLoss (P, G, θ)

(8)where ∂ is a hyperparameter that controls the weighting between the BCE loss and the Dice loss.

SENetEncoder module for few-labeled SEM images

Due to lacking the labeled SEM images of masterbatch agglomeration microstructure, we repurpose the convolutional layers of the ResNeXt50 network as part of the SeNetBlock. Figure 2 illustrates the structure of the SeNetBlock with downsampling, which comprises several processing blocks. The input image goes through convolutional layers, batch normalization layers and nonlinear activation layers to generate feature maps as formulas (9)–(11)

U_{c j}^{I} = \sum_{i \in M_{j}} X_{c j}^{I - 1} * K_{i j}^{I}

(9)

O_{c j}^{I} = \frac{U_{c j}^{I} - μ_{c j}^{I}}{\sqrt{σ_{c j}^{I} + ϵ}} \cdot γ_{c j}^{I} + β_{c j}^{I}

(10)

X_{c j}^{I} = f (O_{c j}^{I})

(11)where f (·) represents the ReLU activation function. X ^I−1_{c j} denotes the

j - t h

channel of the feature map at the

(I - 1) - t h

layer.

K_{i}^{I}

represents the convolutional kernel. M_j signifies the input subset, and

‘ *'

denotes the convolution operation.

U_{c j}^{I}

represents the network activation after the I-th layer’s convolution operation. As shown in Figure 2, the first processing block consists of a 1 × 1 convolution (Conv1), a batch normalization layer (bn1), and a nonlinear activation layer (ReLU). The second processing block includes a 3 × 3 convolution (Conv2), a batch normalization layer, and a nonlinear activation layer. For the second processing block, the input

X^{2}

goes through a 1 × 1 convolution layer (Conv3) and a batch normalization layer to obtain O³. O³ is then divided into two parts, with one part passing through the SEModule. The SEModule is composed of a pooling layer, a 1 × 1 convolutional layer, and a nonlinear activation layer. Additionally, O³ undergoes average pooling, a 1 × 1 convolution (FC1), a rectified linear unit, a further 1 × 1 convolution (FC2), and finally the sigmoid function. The output is X³, which is obtained by element-wise multiplication with X³, and at the same time, the input goes through a layer of 1 × 1 convolution (Conv0) and batch normalization layer to obtain O⁴

O^{4} = f_{b n} (Conv (X^{0}))

(12)

X^{4} = f_{relu} (O^{4} + X^{3} ⊙ O^{3})

(13)where X⁰ is the input of the SeNetBlock,

⊙

denotes element wise multiplication, and

+

denotes element-wise addition. The output X⁴∈

R^{k_{l} \times F_{l} \times T_{l}}

of the fourth layer is the output of the SeNetBlock. Similarly, for the SeNetBlock without a downsampling module,

O^{4}

is the input data.

Figure 2.

The diagram illustrates the structure of the SeNetBlock with Downsampling.

Figure 2 illustrates the hierarchical structure of the encoder path, which comprises five layers. Each layer takes as input from either the original image or the encoding feature map from the previous layer. The output of each layer is both forwarded to the next layer and directly transmitted to the decoder through skip connections. The initial layer of the encoder consists of a convolutional layer with a 7 × 7 kernel size, a stride of 2, and a padding size of 3. This convolutional layer’s primary purpose is to extract initial features, which are subsequently subjected to batch normalization and ReLU activation before being downsampled by a max pooling layer. This operation reduces the feature map’s size by half while keeping the number of channels unchanged. The first SeNetBlock block in the encoder is redesigned for downsampling feature information. Each downsampling step diminishes the feature map’s size by half while doubling the number of channels. The encoder progressively reduces resolution through convolution and pooling operations, generating multi-level features with varying resolutions. These features are subsequently transmitted to the decoder via skip connections. In contrast, the decoder gradually increases resolution through operations such as transposed convolution. The inclusion of skip connections serves two key purposes: firstly, it enables high-resolution feature information to flow seamlessly between the downsampling and upsampling processes. Secondly, skip connections aid in preserving finer details when handling small targets or boundaries.

UnetDecoder module for microstructure features

Our decoder comprises four layers, with each layer receiving the encoding feature maps from its corresponding layer and higher layers. The fourth layer of the decoder is taken as an illustrative example: the encoding feature map from the fifth layer undergoes an initial upsampling process, increasing the number of feature maps while reducing their size. This upsampled feature map, achieved through transposed convolution, matches in size with the encoding feature map of the fourth layer

F_{upsampled} [i, j, c] = \sum_{m, n} F_{{trans}_{conv}} [i - m, j - n, c, k] \cdot F_{{cn}_{fin}} [m, n, k]

(14)where

{F_{t r a n s}}_{c o n v}

represents the weight of the transposed convolution, ‘·’ denotes element-wise multiplication. After performing transposed convolution, the size of the obtained feature map

F_{u p s a m p l e d}

i_{*} j

with a channel number of c. Next, the obtained feature map

F_{u p s a m p l e d}

is concatenated with the feature map F2 directly passed from the fourth layer of the encoder

F_{upsampled} [i, j, c] = {\begin{cases} F_{upsampled} [i, j, c], i f o \leq k < C_{1} \\ F_{upsampled} [i, j, c], i f C_{1} \leq k < C_{1} + C_{2} \end{cases}

(15)

The channel number of the concatenated feature map $F_{c o n c a t}$ , formed by combining feature maps C₁ and C₂, is C₁+C₂. The concatenated feature map then undergoes an attention module, followed by two convolutional layers. Subsequently, a second attention module is applied, followed by another upsampling step to align the feature map size with that of the encoding feature map from the third layer. Within the upsampling pathway, the number of feature maps gradually decreases, while their spatial dimensions progressively increase. Moreover, the scSE attention module⁴⁰ was incorporated to enhance our model’s understanding of both local and global image structures. The scSE attention can be expressed as two parallel modules: the s SE module (Spatial Squeeze module) and the cSE module (Channel Squeeze module). The overall architecture of the scSE attention module is depicted in Figure 3. In the c SE module, the research⁴⁰ assumes that the input feature map is a combination of channels U_i ∈ R ^H×W. The channel information is compressed using global average pooling to produce a feature vector Z ∈ R ^{1×1 ×C}, where the feature vector of the k-th layer is given by formula (16)

Z_{k} = \frac{1}{H \times W} \sum_{i} \sum_{j} W_{j} u_{k} (i, j)

(16)where the vector z contains global spatial information. Next, the vector passes through two fully connected (FC)layers and a ReLU operator

σ

. The resulting vector is used for recalibration or activation of U as formula (17)

{\hat{U}}_{c S E} = F_{c S E} (U) = [σ ({\hat{Z}}_{1}) u_{1}, σ ({\hat{Z}}_{2}) u_{2}, \dots, σ ({\hat{Z}}_{C}) u_{C}]

(17)where the activation

σ (\tilde{Z_{i}})

represents the importance of the i-th channel. In the SSE module,⁴⁰ introduced the spatial squeeze operation is achieved through a convolution

q {= W}_{s q}_{*} U

, where the weight

W_{s q} \in R^{1 \times 1 \times C \times 1}

, resulting in a projection tensor q ∈ R ^H×W. The generated vector is then used for recalibration or activation of U as formula (18)

{\hat{U}}_{S S E} = F_{S S E} (U) = [σ (q_{1, 1}) u^{1, 1}, \dots, σ (q_{i, j}) u^{i, j}, \dots, σ (q_{H, W}) u^{H, W}]

(18)where each value σ (q_{i, j}) corresponds to the relative importance of the spatial information (i, j) in the given feature map. As shown in Figure 3, research⁴⁰ obtain the concurrent spatial and channel SE,

\tilde{U_{S C S E}}

, by element-wise addition of the channel and spatial excitation,

\tilde{U_{S C S E}} = \tilde{U_{C S E}} + \tilde{U_{S S E}}

. This step integrates information from both spatial and channel perspectives, enabling the model to focus more effectively on crucial spatial locations and channel features.

Figure 3.

The diagram depicts the structure of the scSE attention module.

Learning the proposed network

A network initialized with transfer learning requires only a small amount of training data. In this study, the Micro-KTNet model was trained using a small dataset of fiber granule aggregation and its performance was evaluated on test samples. During the training process, the Adam50 optimizer was employed with an initial learning rate of 2e-4. The loss function used was a weighted combination of the balanced cross-entropy (BCE) and Dice loss, with BCE assigned a weight of 70%. BCE (Binary Cross-Entropy) is generally more stable during the initial phases of training, which aids in speeding up model convergence. On the other hand, Dice Loss places greater emphasis on pixel-level similarity and promotes the generation of smoother and more continuous segmentation results. By combining BCE with Dice Loss, a reduction in overall loss was achieved while still benefiting from the stability conferred by BCE. The implementation of the Micro-KTNet model was carried out using PyTorch, the segmentation models library, and other relevant third-party libraries. These experiments were conducted on a personal computer with 24 GB of memory and an Nvidia GeForce RTX 3090 graphics processing unit, utilizing Python for implementation.

Experimental evaluation and discussion

In this study, the performance of the Micro-KTNet model was assessed through a series of experiments focused on segmenting fiber granule microstructure aggregation. The following sections provide detailed explanations of the experimental setup and the implementation of the Micro-KTNet network. Furthermore, its performance was compared with other algorithms designed for segmenting fiber granule microstructure aggregation. Finally, an ablation study was conducted to dissect various components of the Micro-KTNet model.

Data creation and evaluation metrics

A scanning electron microscope (SEM) was used to capture high-resolution microstructure images of fiber granules. These images are detailed, but they also contain some unwanted interference, such as stains. To mitigate these issues stemming from the acquisition process, a series of preprocessing steps involving transformation, segmentation, and diversification techniques were employed. This culminated in the creation of a dataset termed SEM DHU-150, consisting of microstructure aggregation data from fiber granules. In SEM-DHU-150, positive labels denote particles resulting from aggregation in fiber granule microstructures, while negative labels are assigned to other structures, such as gaps between fibers.

To construct the dataset, image patches of size 512 × 512 were randomly extracted from the high-resolution original images. The segmentation results of the original images are demonstrated in Figure 4. Positive samples were manually labeled. Ultimately, around 150 samples were assembled for the dataset, with 90 samples allocated for the training set, 40 samples for validation, and the remaining samples earmarked for prediction. Given the limited number of training set samples, data augmentation techniques were adopted to enhance the model’s performance. Firstly, random horizontal flips were applied to the images with a 30% probability to make the model learn invariant image features. Additionally, random adjustments to contrast, brightness, and gamma were introduced to increase data diversity and enhance the model’s adaptability to variations in lighting conditions. Lastly, blurring and image sharpening effects were incorporated, prompting the model to focus more on critical features, such as image edges, during training. The segmentation results of the original images are demonstrated in Figure 5. The results of data augmentation are shown in Figure 6.

Figure 4.

An example illustrating the segmentation process of an original image.

Figure 5.

The result of data augmentation: Images (a) and (e) are the original unprocessed images, while images (b) and (f) represent the modified versions with a horizontal flip, (c) and (g) represent adjustments to brightness and contrast respectively, (d) and (h) are undergoing blurring and sharpening processes.

Figure 6.

Figure (a–f) showcase examples of fiber masterbatch aggregation segmentation. In these examples, yellow pixels represent true positives, green pixels represent false negatives, red pixels represent false positives, and blue pixels represent true negatives.

To evaluate the effectiveness of the Micro-KTNet model, four quantitative metrics were employed: IOU, precision, recall, and F1-Score, which are defined as follows

I O U = \frac{T P}{T P + F N + F P} \times 100 %

(19)

precision = \frac{T P}{T P + F P} \times 100 %

(20)

recall = \frac{T P}{T P + F N} \times 100 %

(21)

F 1 score = 2 \times \frac{precision \times recall}{precision + recall} \times 100 %

(22)where TP and FN refer to the proportions of samples identified as having clustering and not having clustering, respectively. FP refers to the proportion of samples not having clustering that are incorrectly identified as having clustering.

Experiment results

In this study, the effectiveness of the fiber granule microstructure segmentation method was assessed and a comparative analysis of its segmentation results was conducted against other methodologies.

The Micro-KTNet model demonstrates impressive performance in segmenting fiber granule aggregates. When dealing with SEM images containing only large-sized aggregate particles within fiber granules, the proposed network accurately segments these particles. Even in scenarios where SEM images contain multiple aggregates with slight variations in size, our network still performs admirably. However, challenges arise when SEM images feature a substantial number of aggregate particles with considerable size differences, leading to occasional mis-segmentation or missed segmentation instances. This observation is evident in Figure 6(a) and (e), and it could be attributed to two key factors. Firstly, some aggregate particles may be situated at the edges of the image, where the model lacks adequate contextual information to effectively learn these features, resulting in instances of missed segmentation, as depicted in Figure 6(a). Secondly, the complexity of the fiber texture can complicate the differentiation process, as certain areas of the fiber texture may resemble the texture of small-sized aggregates. This similarity makes it challenging for the model to distinguish between them accurately. In summary, this model achieves an IOU (Intersection over Union) score of 86. 65% on the SEM-DHU-150 dataset, signifying its overall strong performance in segmenting fiber granule microstructures.

Comparative analysis

In the evaluation on the SEM-DHU-150 dataset, the segmentation results produced by Micro-KTNet were compared with several commonly used segmentation methods. Among the chosen approaches, Lin et al.⁴¹ leverages a multi scale feature pyramid to integrate semantic information from different levels. Zhao et al.⁴² employs a structure called Pyramid Pooling to capture multi-scale contextual information. Additionally, Unet,⁴³ LinkNet,⁴⁴ MANet,⁴⁵ UnetPlusPlus,⁴⁶ and DeepLabV3Plus³⁸ all follow an Encoder-Decoder architecture. Unet distinguishes itself with skip connections, while MANet introduces attention mechanisms based on the Unet framework. UnetPlusPlus extends network depth and adopts a more densely connected structure. DeepLabV3⁴⁷ employs dilated convolutions to expand the receptive field, along with global average pooling and multi-scale information fusion to enhance object boundary segmentation. DeepLabV3Plus further enhances segmentation accuracy by adopting an encoder-decoder structure.

The quantitative results presented in Table 1 demonstrate that Micro-KTNet outperforms all other methods in terms of IOU (Intersection over Union). This outcome aligns with our initial expectations, suggesting that the transfer learning approach enhances the model’s generalization performance. As shown in Figure 7, the proposed framework indeed achieves the highest F1-score compared with other methods. This achievement can be partly attributed to the attention module in the decoder, which integrates both global and local information, resulting in a more balanced precision and recall. Ablation experiments were conducted in Section 4.4 to further investigate the effectiveness of these two modules. Additionally, a qualitative analysis is conducted on partial segmentation results of selected networks, as shown in Figure 8.

Table 1.

The performance of different models on the fiber masterbatch microstructures aggregation dataset.

Methods	IOU (%)	Precision (%)	Recall (%)	F1score (%)	Test time (ms)
PSPnet	62.43	57.46	64.43	63.62	4.43
FPN	64.84	68.95	68.60	66.27	6.54
LinkNet	72.12	70.25	77.23	74.44	6.76
MANet	75.22	74.05	77.21	75.62	12.04
DeepLabV3	74.68	81.76	82.32	81.10	13.24
Unet	74.05	78.35	81.20	80.31	7.16
UnetPlusPlus	76.00	82.43	88.39	84.35	15.82
Micro-KTNet	86.65	91.85	97.21	94.50	8.94

Bold values represent the best statistically significant results.

Figure 7.

The F1-score of each method during learning procedure.

Figure 8.

The segmentation results of different models on the fiber masterbatch super microstructure aggregation dataset. (a) to (e) show the results of aggregate particle size decreasing in the input images, where (a) exhibits the largest aggregated particles and (e) the smallest. In these results, yellow pixels represent true positives, green pixels represent false negatives, red pixels represent false positives, and blue pixels represent true negatives.

When the input consists of Figure 8(d) and (e), both DeepLabV3 and Unet tend to miss the desired parts of the aggregate particles (indicated in green). Particularly, Unet struggles with segmenting the edges, and its performance decreases significantly when the input is Figure 8(c), where nearly 60%of the aggregate particles are not properly segmented. UnetPlusPlus, while an improvement over Unet, still suffers from issues such as segmenting out undesired aggregate particles at the image edges (highlighted in red). Comparatively, the proposed Micro-KTNet network exhibits fewer misidentifications at the image edges and significantly outperforms both DeepLabV3 and Unet in segmenting the desired aggregate particles. In fact, the Micro-KTNet network achieves nearly perfect segmentation in these cases. In summary, the proposed Micro-KTNet network demonstrates the most robust performance in binary segmentation, outperforming the other methods considered in benchmark dataset of masterbatch agglomeration.

Ablation experiments

In this section, the strong performance of the Micro-KTNet model was attributed to the enhancements in model generalization achieved through transfer learning and the incorporation of the attention module in the decoder. This section delves into the effectiveness of these two components through ablation experiments. Table 2 presents the ablation results obtained when either the transfer learning component or the attention module is excluded. The ablation study reveals that the segmentation performance of Micro-KTNet (without the transfer learning component) is inferior to that of Micro-KTNet (without the attention module). This indicates that the transfer learning component plays a more pivotal role in the Micro-KTNet.

Table 2.

The performance of the Micro-KTNet model on the fiber masterbatch microstructure aggregation dataset under different conditions.

Methods	IOU (%)
Micro-KTNet (without attention module and transfer learning component)	72.12
Micro-KTNet (without attention module)	83.18
Micro-KTNet (without transfer learning component)	75.37
Micro-KTNet-MicroNet	86.65
Micro-KTNet-ImageNet	85.07
Micro-KTNet-Image-MicroNet	87.04

Bold values represent the best statistically significant results.

Table 2 also sheds light on the impact of reusing convolutional layers pre-trained on various datasets, including the MicroNet dataset, the ImageNet dataset, and a combined Image-MicroNet dataset, on the model’s segmentation performance. Remarkably, the model performs exceptionally well in all three cases. However, it’s noteworthy that the model pre-trained on the MicroNet dataset outperforms the one trained on the ImageNet dataset. One plausible explanation is that the filters trained on the ImageNet dataset may not be entirely suited for the unique characteristics of fiber mother grain nanostructure aggregation images. Nevertheless, the optimal approach appears to be pre-training the convolutional neural network on ImageNet and subsequently fine-tuning it on the MicroNet dataset.

Assessing the impact of architectures

In this study, A large number of experiments were conducted to explore the effects of replacing ResNeXt-50 with various popular network architectures. These architectures include DenseNet121,²⁴ DPN107,⁴⁸ EfficientNet-B5,⁴⁹ Inception-ResNet-V2,⁵⁰ Inception-V4,⁵¹ MobileNet-V2,⁵² ResNet-50,⁵³ ResNeXt-50_32 × 4d,⁵⁴ SENet-154,⁵⁵ VGG-16_bn,⁵⁶ and Xception.⁵⁷ All networks were fine-tuned using transfer learning, with the same training configurations applied to ensure consistency. The results showed notable differences in performance. The IOU (Intersection over Union) scores ranged from 66.01% (MobileNet-V2) to 86.65% (ResNeXt-50_32 × 4d).

Table 3 showed that ResNeXt-50_32 × 4d achieved the highest IOU of 86.65%. This result highlights its strong ability to extract features for this task. SENet-154 and Xception also performed well, with IOU scores of 85.08% and 85.17%, respectively. This suggests that models with advanced feature integration mechanisms can perform better. In contrast, EfficientNet-B5 and MobileNet-V2 had lower scores, with IOUs of 67.95% and 66.01%. Their lightweight designs may have limited their feature extraction capabilities. Models like ResNet-50, VGG-16_bn, and the Inception series (Inception-ResNet-V2 and Inception-V4) showed moderate performance. These results confirm that the choice of encoder architecture has a significant impact on task performance.

Table 3.

The performance of the Micro-KTNet model on the fiber masterbatch microstructure aggregation dataset with different encoder.

Methods	IOU (%)
DenseNet121	82.48
dpn107	83.97
EfficientNet-b5	67.95
Inception-ResNet-V2	81.37
Inception-V4	77.76
MobileNet-V2	66.01
ResNet-50	82.30
ResNeXt-50_32 × 4d	86.65
SENet-154	85.08
VGG-16_bn	81.00
Xception	85.17

Bold values represent the best statistically significant results.

Assessing the impact of hyperparameters

In this experiment, some hyperparameters, such as learning rate and batch size, were found to affect the performance of the proposed method. To explore the relationship between hyperparameters and segmentation results, four experiments with different batch sizes were conducted, and the model’s performance was evaluated using four metrics (IOU, F1-score, precision, recall), as shown in Figure 9. A larger batch size enables the model to consider more samples in each update step, facilitating the learning of complex patterns and structures in the data. For example, when the batch size increased from 2 to 14 in the experiment, the IOU improved from 81.12% to 86.59%, the F1 score rose from 86.93% to 94.50%, and the precision and recall increased from 82.33% and 88.31% to 91.85% and 97.21%, respectively. These improvements demonstrate how a larger batch size can stabilize the gradient updates and enhance the model’s ability to generalize patterns within the data.

Figure 9.

Evaluation of performance with different batch size.

However, when the batch size became too large, such as increasing it to 20, the metrics began to decline. For instance, the IOU dropped to 81.72%, the F1 score decreased to 87.03%, and both precision and recall fell to 82.56% and 89.13%, respectively. This decline suggests that while larger batch sizes allow the model to process more data in a single iteration, they also reduce the number of gradient update steps within an epoch. This can lead to less frequent weight updates and the loss of fine-grained gradient information, which is critical for learning subtle patterns in the data.

Furthermore, in the context of smaller datasets, this problem can be exacerbated. Excessively large batch sizes mean that the model might process the entire dataset in just a few iterations, limiting its ability to fully explore the data distribution and learn diverse features. As seen in this case, when the batch size continued to increase to 26 and 32 (hypothetical data), the IOU further decreased to 81.00% and 79.80%, while the F1 score fell to 86.00% and 84.20%. These trends illustrate the trade-off: while larger batch sizes improve computational efficiency, they can hurt the model’s generalization ability if set excessively high.

Assessing the impact of hyperparameters

The purpose of this experiment was to investigate the impact of the number of training samples on the performance of the model, providing further validation of its adaptability and generalization ability under limited data conditions. By adjusting the size of the training set, the study aimed to evaluate the model’s performance in scenarios with extremely few labeled samples and reveal its behavior as the amount of data increases.

In the experimental setup, the test set was fixed at 40 images, while the training set was divided into subsets with 1 image (40:1), 5 images (8:1), 10 images (4:1), 20 images (2:1), and 40 images (1:1). The model’s performance was evaluated using the Intersection over Union (IoU) metric as the primary indicator, complemented by pixel-level classifications illustrated in the provided visual results (Figure 10).

Figure 10.

(a) shows an example from the test set of the SEM-DHU-150 dataset. (b) shows the evaluation of performance with different numbers of training samples. In these results, yellow pixels represent true positives, green pixels represent false negatives, red pixels represent false positives, and blue pixels represent true negatives.

The IoU results demonstrated that the model achieved 79.55% IoU with 40 training samples, 75.61% IoU with 20 samples, 68.59% IoU with 10 samples, 46.32% IoU with 5 samples, and 23.74% IoU with only 1 sample. Several observations can be drawn from these results. First, even under conditions with very few samples, such as 5 images (46.32% IoU), the model still exhibited relatively reasonable performance. This is attributed to the use of a pre-trained encoder, which was trained on Micronet and possesses strong feature extraction capabilities, thereby enabling effective performance with limited training data. Second, as the number of training samples increased, the model’s performance improved steadily. For example, IoU increased from 23.74% with 1 image to 79.55% with 40 images. This improvement is due to the additional training data enabling the model to enhance its recognition and generalization ability when the ratio of training to testing data is below 1:1. Third, in the most extreme case, with only 1 training image, the results were suboptimal. This is likely because the information contained in a single image is significantly less representative compared to the 40-image test set, causing the model to focus excessively on limited features and struggle with generalization.

The results of this experiment provide further evidence of the model’s strong performance in low-resource scenarios while also highlighting the advantages of increasing training data for achieving optimal performance. Under extremely limited data conditions, the results depend heavily on the quality of the pre-trained features, as demonstrated in the 5-training scenario, where the pre-trained encoder contributed significantly to the relatively strong performance.

Conclusions and future work

This paper proposes a novel microstructure recognition architecture, named Micro-KTNet, which is designed for segmenting SEM images of fiber masterbatch aggregation. The model incorporates a transfer learning-based encoder, pre-trained on the MicroNet dataset, to extract domain-specific features effectively. The decoder, enhanced with SCSE attention modules, combines global and local features to achieve precise segmentation. The encoder-decoder architecture with skip connections facilitates multi-scale feature utilization, ensuring detailed and contextually consistent predictions. On the SEM-DHU-150 dataset, Micro-KTNet achieves state-of-the-art performance, with an IOU of 86.65% and an F1-score of 94.50%. Despite its strong performance, challenges remain in segmenting aggregation particles near image edges and in distinguishing fine textures within complex regions. Future work will address these limitations while also focusing on integrating explainable AI methods, such as visualizing attention maps, to enhance the transparency of the model’s decision-making process. This addition will improve industry trust and facilitate broader adoption in industrial environments.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Shanghai Sailing Program (22YF1401300), Cultivation Project of Discipline Innovation (XKCX202313) and Postdoctoral Research Foundation of China (2024M752306).

ORCID iD

Bing Wei

References

Prashantha

Soulestin

Lacrampe

, et al. Multi walled carbon nanotube filled polypropylene nanocomposites based on masterbatch route:Improvement of dispersion and mechanical properties through pp-g-ma addition. Express Polym Lett 2008; 2(10): 735–745.

Kumagai

Tajima

Iwamoto

, et al. Properties of natural rubber reinforced with cellulose nanofibers based on fiber diameter distribution as estimated by differential centrifugal sedimentation. Int J Biol Macromol 2019; 121: 989–995.

Pötschke

Bhattacharyya

Janke

. Melt mixing of polycarbonate with multiwalled carbon nanotubes: microscopic studies on the state of dispersion. Eur Polym J 2004; 40(1): 137–148.

Bledzki

Faruk

. Injection moulded microcellular wood fibre–polypropylene composites. Compos Appl Sci Manuf 2006; 37(9): 1358–1367.

Bello

Nápoles

Sánchez

, et al. Deep neural network to extract high-level features and labels in multi-label classification problems. Neurocomputing 2020; 413: 259–270.

Wang

Fan

Wang

. Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recogn Lett 2021; 141: 61–67.

Zhao

Zheng

, et al. Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 2019; 30(11): 3212–3232.

Garcia-Garcia

Orts-Escolano

Oprea

, et al. A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv:170406857 2017.

Feldmann

Youngblood

Wright

, et al. All-optical spiking neurosynaptic networks with self-learning capabilities. Nature 2019; 569(7755): 208–214.

10.

Iqbal Hussain

Khan

Wang

, et al. Woven fabric pattern recognition and classification based on deep convolutional neural networks. Electronics 2020; 9(6): 1048.

11.

Bhuvaneswari

Priyadharshini

Deepa

, et al. Deep learning for material synthesis and manufacturing systems: a review. Mater Today Proc 2021; 46: 3263–3269.

12.

Hou

Yuan

, et al. Deep learning-assisted real-time defect detection and closed-loop adjustment for additive manufacturing of continuous fiber-reinforced polymer composites. Robot Comput Integrated Manuf 2023; 79: 102431.

13.

Wei

Hao

Gao

, et al. Bioinspired visual-integrated model for multilabel classification of textile defect images. IEEE Trans Cogn Dev Syst 2020; 13(3): 503–513.

14.

Tan

Yao

Peng

, et al. Multi-level medical image segmentation network based on multi-scale and context information fusion strategy. IEEE Trans Emerg Top Comput Intell. 2024; 8: 474–487.

15.

Cao

Gao

, et al. Fundus image enhancement via semi-supervised gan and anatomical structure preservation. IEEE Trans Emerg Top Comput Intell. 2024; 8: 313–326.

16.

Wei

Hao

Gao

, et al. Detecting textile micro-defects: a novel and efficient method based on visual gain mechanism. Inf Sci 2020; 541: 60–74.

17.

Zheng

, et al. Efficient recognition and automatic sorting technology of waste textiles based on online near infrared spectroscopy and convolutional neural network. Resour Conserv Recycl 2022; 180: 106157.

18.

Zhang

Han

Wei

, et al. A spatial–spectral adaptive learning model for textile defect images recognition with few labeled data. Complex Intell Syst 2023; 9(6): 6359–6371.

19.

Quan

Hildebrand

DGC

Jeong

. Fusionnet:A deep fully residual convolutional neural network for image segmentation in connectomics. Front Comput Sci 2021; 3: 613981.

20.

Chen

Liu

Zhang

, et al. Transattunet:Multi-level attention-guided u-net with transformer for medical image segmentation. IEEE Trans Emerg Top Comput Intell. 2024; 8: 55–68.

21.

Wei

Gao

Tang

, et al. Multi-class object learning with application to fabric defects detection. AATCC J Res 2021; 8(1 suppl): 165–172.

22.

Zeng

Xie

Zhang

, et al. Ric-unet: an improved neural network based on unet for nuclei segmentation in histology images. IEEE Access 2019; 7: 21420–21428.

23.

Zhang

, et al. Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European conference on computer vision(ECCV), 2018; 286– 301.

24.

Taormina

Tegolo

Valenti

. Transfer learning approach for high-imbalance and multi-class classification of fluorescence images. In: Destercke

Martinez

Sanfilippo

(eds). Scalable Uncertainty Management. SUM 2024. Lecture Notes in Computer Science. Cham: Springer, 2025, vol 15350, pp. 461–469. DOI: 10.1007/978-3-031-76235-2_34.

25.

Ribani

Marengoni

. A survey of transfer learning for convolutional neural networks. In: 2019 32nd SIBGRAPI conference on graphics, patterns and images tutorials(SIBGRAPI-T), Rio de Janeiro, Brazil, 28–31 October 2019. IEEE, pp. 47–57.

26.

Tan

Sun

Kong

, et al. A survey on deep transfer learning. In: Artificial neural networks and machine learning–ICANN 2018: 27th international conference on artificial neural networks, Rhodes, Greece, 4–7 October 2018, Proceedings, Part III 27. Springer, pp. 270–279.

27.

Oquab

Bottou

Laptev

, et al. Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1717–1724.

28.

Simonyan

Zisserman

. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556 2014.

29.

Liu

Shen

Lin

. Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition,2015; 5162-5170.

30.

Hassanzadeh

Essam

Sarker

. Evodcnn:An evolutionary deep convolutional neural network for image classification. Neurocomputing 2022; 488: 271–283.

31.

Stuckner

Harder

Smith

. Microstructure segmentation with deep learning encoders pre-trained on a large microscopy dataset. Npj Comput Mater 2022; 8(1): 200.

32.

Cho

Van Merri¨enboer

Gulcehre

, et al. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:14061078 2014.

33.

Wang

, et al. Chinese poetry generation with planning based neural network. arXiv preprint arXiv:161009889 2016.

34.

Serban

Sordoni

Lowe

, et al. A hierarchical latent variable encoder-decoder model for generating dialogues. In: Proceedings of the AAAI conference on artificial intelligence. 2017, 31(1).

35.

Park

Kim

Kang

, et al. Sequence-to-sequence prediction of vehicle trajectory via lstm encoder-decoder architecture. IEEE intelligent vehicles symposium(IV). IEEE, 2018, pp. 1672–1678. (accessed 2018).

36.

Yasrab

Zhang

. Scnet:A simplified encoder-decoder cnn for semantic segmentation. In: 2016 5th international conference on computer science and network technology (ICCSNT). IEEE, 2016: 785-789.

37.

Badrinarayanan

Kendall

Cipolla

. Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 2017; 39(12): 2481–2495.

38.

Chen

Zhu

Papandreou

,et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV). 2018: 801-818.

39.

Yang

Sun

, et al. Seismic fault detection using an encoder–decoder convolutional neural network with a small training set. J Geophys Eng 2019; 16(1): 175–189.

40.

Roy

Navab

Wachinger

. Concurrent spatial and channel‘squeeze&excitation’in fully convolutional networks. In: Medical image computing and computer assisted intervention–MICCAI 2018: 21st international conference, Granada, Spain, 16–20 September 2018, Proceedings, Part I. Springer, pp. 421–429.

41.

Lin

T Y

Dollár

Girshick

, et al. Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125.

42.

Zhao

Shi

, et al. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2881-2890.

43.

Ronneberger

Fischer

Brox

. U-net:Convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015:18th international conference, Munich, Germany, 5–9 October 2015, Proceedings, part III 18. Springer, pp. 234–241.

44.

Chaurasia

Culurciello

. Linknet:Exploiting encoder representations for efficient semantic segmentation. In: 2017 IEEE visual communications and image processing(VCIP). IEEE, 2017: 1-4.

45.

Liang

Sun

Zhang

, et al. Mutual affine network for spatially variant kernel estimation in blind image super resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 4096-4105.

46.

Zhou

Siddiquee

MMR

Tajbakhsh

, et al. Unet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans Med Imaging 2019; 39(6): 1856–1867.

47.

Chen

Papandreou

Schroff

, et al. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:170605587 2017.

48.

Huang

Liu

Van Der Maaten

, et al. Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700-4708.

49.

Chen

Xiao

, et al. Dual path networks. Adv Neural Inf Process Syst. 2017; 30: 4468–4476.

50.

Tan

. Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR. 2019; 6105–6114.

51.

Szegedy

Ioffe

Vanhoucke

, et al. Inception-v4, inception-resnet and the impact of residual connections on learning. Proc AAAI Conf Artif Intell. 2017; 31(1). DOI: 10.1609/aaai.v31i1.11231.

52.

Sandler

Howard

Zhu

, et al. Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.

53.

Zhang

Ren

, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.

54.

Xie

Girshick

Dollár

, et al. Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500.

55.

Shen

Sun

. Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132–7141.

56.

Simonyan

Zisserman

. Very deep convolutional networks for large-scale image recognition. arXiv Prep. arXiv1409.1556 2014.

57.

CholletXception

: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1251–1258.

Micro-KTNet: Microstructure knowledge transfer learning for fiber masterbatch agglomeration recognition

Abstract

Keywords

Introduction

Related works

Transfer learning

Encoder-decoder

The proposed Micro-KTNet

System overview

Transfer learning-based microstructure features

SENetEncoder module for few-labeled SEM images

UnetDecoder module for microstructure features

Learning the proposed network

Experimental evaluation and discussion

Data creation and evaluation metrics

Experiment results

Comparative analysis

Ablation experiments

Assessing the impact of architectures

Assessing the impact of hyperparameters

Assessing the impact of hyperparameters

Conclusions and future work

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References