A context-aware progressive attention aggregation network for fabric defect detection

Abstract

Fabric defect detection plays a critical role for measuring quality control in the textile manufacturing industry. Deep learning-based saliency models can quickly spot the most interesting regions that attract human attention from the complex background, which have been successfully applied in fabric defect detection. However, most of the previous methods mainly adopted multi-level feature aggregation yet ignored the complementary relationship among different features, and thus resulted in poor representation capability for the tiny and slender defects. To remedy these issues, we propose a novel saliency-based fabric defect detection network, which can exploit the complementary information between different layers to enhance the representation features ability and discrimination of defects. Specifically, a multi-scale feature aggregation unit (MFAU) is proposed to effectively characterize the multi-scale contextual features. Besides, a feature fusion refinement module (FFR) composed of an attention fusion unit (AFU) and an auxiliary refinement unit (ARU) is designed to exploit complementary important information and further refine the input features for enhancing the discriminative ability of defect features. Finally, a multi-level deep supervision (MDS) is adopted to guide the model to generate more accurate saliency maps. Under different evaluation metrics, our proposed method outperforms most state-of-the-art methods on our developed fabric datasets.

Keywords

Fabric defect detection visual saliency context-aware multi-scale feature feature aggregation feature refinement multi-level deep supervision

Introduction

The defect detection on the fabric surface is essential for the textile industry. During the production process, due to the influence of machine or human factors, various defects such as warp, weft, or point defect, often appear on the fabric surface.¹ Traditional manual visual detection methods tend to be susceptible to human factors such as inattentiveness or human fatigue, resulting in the low detection accuracy and efficiency. Importantly, unlike the automated detection method, the manual method is typically based on human expertise’s experience in practice, so it cannot provide a quantifiable measurement for the effectiveness of defect detection. It has been reported that the accuracy of manual inspections can only reach 60–75%² based on feedback from the textile industry. A number of automatic detection methods have been proposed recently to replace manual detection with the development of digital image processing and machine vision, which are mainly divided into four categories: statistical methods,^3,4 structural methods,^5,6 spectrum methods,^7–10 model-based methods.^11,12 Nevertheless, these traditional detection methods are susceptible to the change of illumination, background texture and camera angle, and thus limited its usage in the real-world applications.

Visual saliency detection is a method by imitating human visual characteristics¹³ for determining where objects or regions are most likely to attract human attention in images that has attracted increasing attention and shown great success in a variety of fields, including hyper-spectral anomaly detection,¹⁴ saliency detection,¹⁵ and real-time wood classification.¹⁶ As fabric defects generally are salient in fabric images compared to the complex texture background, the fabric defect detection can be regarded as a salient object detection (SOD) problem, and thus several visual saliency-based fabric defect detection methods^17–19 have been developed and achieved good performance. Earlier saliency-based approaches, however, rely on hand-crafted features, such as color, intensity, and contrast, to characterize local details and global contexts, leaving them unable to characterize high-level features with semantic clues, which limits their capability to locate complete defect objects in complex scenarios.

With the rapid development of convolutional neural network (CNN), CNN-based methods has broken the restrictions of traditional SOD methods that are dependent on hand-crafted features and pushed the performance of SOD to a new level owing to the powerful capability of extracting both high-level semantics and low-level texture details, which has been widely applied in various vision tasks, such as image classification,²⁰ semantic segmentation,²¹ image retrieval,²² image and video compression,²³ scene classification,²⁴ and object recognition.²⁵ Benefiting from that, some learning-based methods^25–34 have been proposed to perform fabric defect detection, which have achieved remarkable detection performance and can ensure a promising speed-accuracy trade-off. However, there still remains two main challenges for the saliency-based defect detection task.

First, most of the existing saliency models based on feature fusion mainly use multi-mode fusion of multi-level features, ignoring the complementary relationship between different scales of features at different layers, which result in poor characterization ability of small and slender defects, leading to unsatisfactory detection results. As shown in Figure 1, the hole (see Figure 1(a)) and stain defects (see Figure 1(b)), as the common defects, have regular shapes and scales, which are relatively easy to detect. However, the defects such as slender defects (see Figure 1(c)), small scale defects (see Figure 1(d)), broken yarn (see Figure 1(e)), and off-yarn (see Figure 1(f)) are usually tiny and irregular in shape, which are more difficult to capture the effective features compared with the defects with regular shapes. Second, owing to the lower contrast with the background texture (see Figure 1(g) and (h)), the existing methods cannot effectively extract the defect features to improve the detection accuracy.

Figure 1.

Different types of defects: (a) hole, (b) stains, (c) slender stains, (d) tiny holes, (e) broken yarn, (f) off-yarn, (g) indentation, and (h) ribbon yarn.

To address the above issues, we propose a novel saliency-based context-aware progressive attention aggregation network for fabric defect detection, named CPA²Net, which achieves brilliant performance in detecting all kinds of defects, especially the tiny and slender defects. Specially, a context-aware multi-scale module (CAMS) comprised of multi-scale feature aggregation unit (MFAU) is proposed to leverage a series of dilated convolutional layers for enlarging the receptive fields without increasing any computational load, which is conductive to defects detection with various scales and the representation learning of the model. MFAU cascades multiple dilated branches, in which the cross-level dense connection is added in a top-down manner to sufficiently aggregate the context-aware multi-scale multi-receptive-field feature for better predictions.

However, the aggregated features obtained by MFAU may contain redundant information, which may result in noise and blurry boundaries. Furthermore, considering that the defects commonly have low contrast with the background texture, it is essential to enhance the visibility or discrimination of the defects. As such, a feature fusion refinement module (FFM) is proposed to mitigate the above mentioned issues, which is mainly composed of the attention fusion unit (AFU) and the auxiliary refinement unit (ARU). Specifically, AFU is innovatively designed to reduce redundancy and adaptively select useful information for concentrating on important regions while suppressing the background interference, where a coordinate attention (CA)³⁵ module followed by sum and concatenation operations is adopted for embedding the positional information into channel attention to enhance the discriminant capability of defects features for complete predictions. In addition, the low-level features commonly contain rich detailed information, while the high-level features commonly contain more global semantic clues. Therefore, to take advantage of the cross-level complementary information for optimizing the training phase, the feedback operation is performed in a top-down pathway to transfer the semantic information from deeper layers to shallower layers. Then, ARU is designed for further refinement and highlighting the input features. More importantly, we adopt a multi-layer loss function as done in Hou et al.^36,37 to supervise the side outputs for more accurate prediction maps.

The contributions of our work can be summarized as follows:

A novel saliency-based method named CPA²Net is proposed for fabric defect detection, which highlights the discrimination ability of defects features representations for accurate detection.

CAMS is designed to enlarge the receptive field by cascading several parallel MAFUs with dilated convolution layers for better copping with the scale variation of fabric defects, especially the tiny and liner defects.

FFR composed of AFU and ARU is designed to make our model can not only combine spatial and channel information to focus on defective areas, but also enhance the contrast between the defects and the background to further enhance the discrimination of defects features.

Compared with 14 state-of-the-art saliency models in terms of six evaluation metrics, our proposed method has shown superior performance on our built datasets.

The rest of this paper is organized as follows. Section 2 briefly discusses the related researches. Section 3 presents the details of our proposed saliency-based fabric defect detection method, in which we specifically describe the proposed CAMS, FFR, and the multi-level deep supervision (MDS) techniques. Section 4 reports the experimental results of our method and Section 5 gives the conclusions of the whole paper.

Related work

Traditional fabric defect detection methods

In this section, previous detection methods based on fabric defects were introduced in detail from two aspects, and the advantages and disadvantages of previous work were listed in detail, as shown in Table 1.

Table 1.

Advantages and disadvantages of previous works.

Methods	Advantages	Disadvantages
Statistical methods
Gray level co-occurrence matrix³⁸	1. Can use fourteen different statistical features to describe the spatial relationships of individual pixels 2. Invariant under gray value transformation	1. Heavy calculation burden 2. Difficult to determine the optimal shift vector 3. Not working well for texture with large-size primitive
Histogram method	1. Simple calculation 2. Translation rotation invariance	1. Sensitivity to noise 2. High false detection rate
Mathematical morphology³⁹	1. Validity of aperiodic image defects 2. Fit random texture 3. Perform as spatial filter for noise removal, edge detection, or feature extraction	1. Low practicability owing to the periodicity of fabric texture 2. No separate visual support
Spectral methods
Wavelet transform⁷	1. Provide multi-scale image analysis 2. Can use different sub-wavelets to identify different defect types 3. High accuracy 4. Compress the image size with less information loss effectively	1. High computational cost 2. Susceptible to image component interference
Fourier transform^10,40	1. Invariant to shift, rotation and scaling 2. Applicable to global and local defect detection	1. Cannot detect the fabric texture of random pattern 2. Cannot be located in the defect region in the spatial domain
Gabor transform^8,9	1. Provide the best spatial and frequency domain defect detection 2. Provide high dimensional feature space at different scales 3. Strong applicability in woven fabrics and knitted fabrics	1. Difficult selection of optimal filter parameters 2. Heavy calculation burden
Structural methods
Zuo et al.,⁵ Gan and Yang⁶, Li et al.⁴¹	Performs well on extremely regular fabric texture images	1. Low detection rate and poor effect on fabric images with weak texture background regularity 2. Low reliability
Model-based methods
Cohen et al.,⁴²Ayoub et al.⁴³	Good representation ability of fabric texture	Extremely complicated calculation amount and process
Visual saliency-based methods
Traditional saliency-based methods^19,44	1. Performs well on images with complex fabric texture 2. High accuracy	1. Poor versatility 2. Limited detection accuracy on tiny defects
Deep learning-based saliency methods^45,46	1. Can learn and generalize the data to give reasonable output 2. Strong real-time performance 3. Has properties of adaptivity, fault tolerance and distributed representation 4. Can capture the salient regions of fabric images	1. High complexity 2. Poor generalization

Statistical methods

Statistical methods usually distinguish defective and defect-free regions by analyzing the first-order and the second-order statistics, such as histogram features, mathematical morphology, greyscale co-occurrence matrices and auto-correlation functions. Hamdi et al.³⁸ combined a greyscale co-occurrence matrix and Euclidean distance followed by a selection threshold to achieve fabric defect detection, however this method is not suitable for the color images. Zhang and Bresee³⁹ combined morphology with an auto-correlation function for defect detection, and is robust to illumination and noise, but it suffers from the higher complexity and computational cost.

Spectral methods

Spectral methods locate defects by outstanding differences between defective and defect-free regions in the spectral domain, which have a good effect on highlighting the defects edges. Fourier⁷ and wavelet transforms^10,40 and Gabor filtering^8,9 are commonly used in the spectral methods. Jing et al.⁴⁷ combined genetic algorithm and Gabor filter to filter the fabric image with pattern, and then segment the processed image for localizing the defects. Hu et al.⁴⁰ combined wavelet transform and Fourier analysis to examine fabrics with periodic texture background through an unsupervised algorithm. However, these models only work with repetitive textured or unpatterned fabric images, whose accuracy is highly dependent upon the specified parameters and filters, making them an inefficient tool for generalizing and adapting to new situations.

Structural methods

The structural method regards the texture as a primitive, extracts the structural features of the fabric texture and speculates on the position law, which can infer the texture background of the entire fabric image through the simple texture background structure law. Li et al.⁴¹ do not match with those provided in the reference list. Please revise accordingly.] proposed an algorithm with self-adaptive partition block modeling via utilizing the characteristic of strong correlation among the patterned fabric image for defect detection. However, the structural methods are low in detection rate, and it is only suitable for fabric texture images with extremely regular texture structure.

Model-based methods

Model-based fabric defect detection algorithms, which model the pattern and texture information to generate the model parameters, and perform defect detection by judging whether the model parameters are satisfied, such as Markov random field,⁴² Bayesian model⁴³ and low-rank decomposition models.⁴⁸ Zhang et al.⁴⁹ segment the jacquard warp-knitted fabric image through jacquard fabric characteristics and Markov random field theory. Mottalib et al.⁵⁰ used Bayesian model to accurately classify fabric defects based on geometric features of defects. Nevertheless, the model-based methods are seldom utilized due to their high dependency on data and complex calculation.

Salient object detection-based fabric defect detection methods

Handcrafted feature-based salient object detection

In recent years, visual saliency has gained more research interest and has been successfully applied in many research fields. Inspired by that, many researchers have tried to model the visual saliency for fabric defect detection field and achieved great progress. By analyzing the saliency of local textures in context, Liu et al.⁴⁴ proposed a fabric defect detection model that performed significantly better than other models on plain fabric images. A learned dictionary was employed by Li et al.¹⁹ to generate saliency maps, and a modified valley emphasis method was used to segment the defective regions. According to Zhang et al.,⁵¹ global and background features were characterized using visual saliency maps, and then a support vector machine (SVM) was utilized to classify fabric defects. However, the traditional visual saliency methods mainly rely on hand-crafted low-level features, which limits their capability to locate complete defects regions in cluttered background because of lacking the high-level semantic knowledge,.

Deep-learning-based salient object detection

With the development of convolutional neural network (CNN) which has powerful capacity of extracting multi-scale features at different levels that contain both rich details and rich semantic cues, CNN-based SOD methods have achieved unprecedented success and been successfully applied in fabric defect detection. Xie et al.⁴⁵ divided detection process into model training and defect location, in which stacked denoising convolutional auto-encoder was used for image reconstruction in the training stage, and the detected images are divided into several blocks for localizing in the positioning stage. RBG (Ross B. Girshick) et al.⁴⁶ proposed the regional convolutional neural network using candidate regions plus convolutional neural networks instead of the traditional handcrafted design to detect defects. Liu et al.⁵² used convolutional neural networks to detect fabric flaws based on a point-to-point approach, and this method performed well for the fabric image with the complex background texture. The deep saliency model developed by Wang et al.⁵³ incorporates self-attention mechanisms into a convolutional neural network to detect fabric defects. Liu et al.⁵⁴ used CNNs-based SOD model to capture fabrics features and combined with low-rank models to display the defects. However, these methods only concentrate on designing a delicate structure to fuse multi-level features, ignoring how to extract powerful and discriminative features and how to effectively fuse them, which cannot efficiently extract the discriminative features and may result in undesirable predication results.

Proposed method

The overall network architecture is shown in Figure 2, which mainly consists of four parts: 1) initail feature extraction block (IFEB); 2) context-aware multi-scale module (CAMS); 3) feature fusion refinement module (FFR); 4) multi-level deep supervision (MDS). The detailed process can be concluded as follows: First, IFEB pre-extracts the low-level, mid-level, and high-level defect features with different scales, and then optimize parameters through forward and backward feedback using the training fabric images and corresponding ground truth. Second, CAMS characterize the multi-scale multi-receptive-field defects features via a series of parallel multi-scale feature aggregation unit (MFAU) with cascaded dilation layers, which makes all level output features have the same channels. Then, FFR transfers the complementary and effective information from deeper layers to shallower layers through a set of attention fusion units (AFUs). In order to refine the defects boundaries for correctly segment the defects regions, the output features generated by AFU are fed into the auxiliary refinement unit (ARU). Finally, four side output predictions are generated and the first side output is chosen as the final prediction. Moreover, the MDS is of great importance in the training phase, which facilitates the optimization and the performance improvement of the proposed model.

Figure 2.

Architecture of our proposed network.

Initial feature extraction block

Initial feature extraction block (IFEB) pre-extracts the multiscale defect features at different layers, which applies CNNs as the initial network. CNNs has been widely adopted in the computer vision field, in which the VGG⁵⁵ and ResNet⁵⁶ are the most popular backbone network, but both of them are not applicable to all object detection task. Specially, ResNet has outstanding classification performance, but it has too many layers and complex structure, which makes the model more difficult to train or test due to the large parameters and heavy computation load. In contrast, the VGG has excellent generalization capability and relatively simple architecture, which is relatively easier to train and deploy. Therefore, we choose the VGG16 as our backbone network, which is constructed only by 13 convolutional layers and 3 fully connected (FC) layers. In our model, we cast away the three FC layers and remove the last pooling layer of VGG16 for preserving the details of last convolutional layer in IFEB.

Context-aware multi-scale module

Due to the complex and diverse texture of the fabric, and the different sizes and irregular shape of the fabric defects, the fabric defect detection is more difficult and challenging. To realize the accurate detection, we have to utilize as much context information as possible for coping with the sales variation of the fabric defects. In CNN, the context information is closely related to the size of the receptive field. Convolution kernels with different sizes have different receptive fields, and the larger convolution kernel will cause the larger computation load. Fortunately, the dilated convolution provides a promising solution, which can capture more context information without increasing computation amount. Inspired by Yang et al.,⁵⁷ Dong et al.,⁵⁸ and Liu et al.,⁵⁹ CAMS composed of a series of parallel MFAUs is designed to capture the context-aware multi-scale defects features and effectively fuse them for coping with the scales variation of defects, especially the tiny and liner defects.

The detailed structure of MFAU are shown in Figure 3. Formally, let $F \in H \times W \times C$ denotes the feature generated by one of the side-output of the FPS. We applied several dilated convolutional layers with different dilation rates $d \geq 1$ for $F$ to obtain the output features $D \in H \times W \times C$ , which has the same spatial resolution with $F$ but considering a larger receptive field. Specifically, there are five cascaded dilated convolution branches, in which the $1 \times 1$ convolution is conductive to retain

Figure 3.

Detailed structure of MFAU. C denotes the concatenation.

The spatial information of the original feature maps and the $3 \times 3$ convolution layers with k different dilation rates $ϑ = {d_{i}}_{i = 1}^{k}$ are adopted for expanding the receptive field to obtain more contextual information. In order to integrate features with different receptive fields, a series of dense connections are added from top to down for enhancing the feature representation learning and then we obtain the output features ${D_{i}}_{i = 1}^{k + 1}$ corresponding to the five convolution branches. The process can be described as follows:

D_{i} = {\begin{cases} C o n v 1 (Y), i = 1 \\ C o n v 2 (C a t (D_{i - 1}, Y)), i = 2 \\ C o n v 3 (C a t (D_{i - 2}, D_{i - 1}, Y)), i = 3 \\ C o n v 4 (C a t (D_{i - 3}, D_{i - 2}, D_{i - 1}, Y)), i = 4 \\ C o n v 5 (C a t (D_{i - 4}, D_{i - 3}, D_{i - 2}, D_{i - 1}, Y)), i = 5 \end{cases}}

(1)

Where $C o n v 1$ and $C o n v i (i = 2, 3, 4, 5)$ denotes the $1 \times 1$ and $3 \times 3$ convolution layer, respectively. $C at (,)$ denotes the concatenation operation.

In addition, a residual connection is applied for avoiding information loss. Then, the output features of the five branches are concatenated with the input features to effectively fuse multi-scale context information for better adapting to scale variation of the fabric defects. Finally, the fused features are performed by a convolution layer followed by a batch normalization layer (BN⁶⁰) and non-linear activation layer (ReLU⁶¹). The final feature maps $C$ can be described as:

C = C o n c a t e (C a t (Y, D_{1}, D_{2}, D_{3}, D_{4}, D_{5}))

(2)

where $C o n c a t e$ is a combination of a convolution layer, a BN layer and a ReLU layer, and $C \in H \times W \times C$ . Note that we set $k = 4$ and $ϑ = {1, 3, 5, 7}$ in our experiments.

Feature fusion refinement module

The output features generated by MAFU may suffer from information redundancy, we should remove the redundancy to avoid the information interference for avoiding generating inaccurate predictions and improving the detection speed. Moreover, the different layers have different characteristics, so the complementary information between different layers plays an important role for accurate fabric defect detection. However, how to effectively utilize this cross-scale complementary information and how to enhance the discriminant capability of defects features are still two great challenges in fabric defect detection. More importantly, the defects boundaries are usually blurry due to the low contrast with the background texture, which will result in incomplete or even false segment results. To mitigate the aforementioned issues, FFR is proposed to enhance the discrimination of defects features and generate more clear boundaries, and it mainly contains two parts including attention fusion part which aims to select important information for integration while removing the information redundancy, and feature refinement part which aims to enhance input features and refine boundaries.

Firstly, we design the AFU for attention fusion part, whose details are shown in Figure 2. Formally, $A_{i} \in H \times W \times C$ denotes the output feature of the $i^{th}$ _AFU. In AFU, the coordinate attention (CA³⁵) is firstly adopted for high and low level features to select important information and enhance the discrimination of defects features. Second, the outputs obtained by CA are integrated through an element-wise sum operation. Then, the integrated features are concatenated with the high-level features through CA and then followed by the convolution layers. Finally, the output feature $A_{i}$ are fed into next AFU to integrate with lower level features, which is conductive to effectively utilize the complementary information of different levels for refining defects features progressively.

Then, ARU is designed for refinement part, whose structure is illustrated in Figure 4, which takes $A_{i} \in H \times W \times C$ as input. After performing a $3 \times 3$ convolution layer, it reduces the channels number of $A_{i} \in H \times W \times C$ to $C / 2$ . Then the channels number of output feature $B_{i}$ is transformed from $C / 2$ to $C$ through another $3 \times 3$ convolution layer to generate $D_{i} \in H \times W \times C$ . More importantly, the $D_{i}$ is then split into a mask $W \in H \times W \times C / 2$ and a bias $b \in H \times W \times C / 2$ , which is concatenated by an element-wise multiplication and sum operation to generate the refined prediction maps. Finally, we squeeze the output channels to 1 via a $1 \times 1$ convolutional layer for subsequent supervision. This process can be expressed by

Figure 4.

The architecture of ARU.

B_{i} = Re L U (B N (C o n v 6 (A_{i})))

(3)

D_{i} = C o n v 7 (B_{i})

(4)

R_{i} = C o n v 8 (Re L U (W B_{i} + b))

(5)

Where $C o n v 6$ , $C o n v 7$ denotes the $3 \times 3$ convolution layer, $C o n v 8$ is the $1 \times 1$ convolution layer and $R_{i} \in H \times W \times C / 2$ .

Multi-level deep supervision

Recent works have pointed out that the effective integration of multi-scale features is essential for saliency detection.^62,63 Specifically, low-level features contain rich details and high-level features contain rich semantic information, but the information may be diluted or removed after the poling layers. Therefore, for reconstructing this information, we have to further optimize the side outputs to compensate for the dilution and loss of information. Through the interpolation operation, MDS can generate supervised defects feature maps that are the same spatial resolution as the fabric defect image, hence accelerating the feature learning process and improving defect detection performance. Moreover, we apply the binary cross-entropy loss with logits function as our loss function to mitigate the class imbalance problem, and it can be calculated by:

l_{i} = - ω_{n} [y {}_{n}\cdot \log σ (x_{n}) + (1 - y_{n}) \cdot \log (1 - σ (x_{n}))]

(6)

Where $l_{i}$ denotes the loss of the $i^{th}$ side output prediction, $y_{n} \in {0, 1}$ represents whether the pixel belongs to the object or not in ground truth. $x_{n} \in (0, 1)$ is the probability of the pixel belonging to salient object and $σ$ is the sigmoid operation. Note that different from HED⁶² and DSS,⁶³ we only choose the first side-output as our final prediction instead of integrating all side outputs as the final prediction results.

Experimental results

Experimental setup

Evaluation datasets

We evaluate our model on our two built fabric datasets: plain dataset and pattern dataset. The plain dataset is comprised of 2200 images for training and 500 images for testing, which concludes many slender defects with low contrast with the texture background and tiny defects. The pattern dataset contains 5948 images for training and 500 images for testing, which has more complex background texture and irregular shape defects. For data augmentation, we adopt the horizontal flipping and rotation for data augmentation to alleviate over-fitting risk. Note that, the plain dataset mainly contains five types of defects: indentation, crease, off-line, stains, and holes. The pattern dataset contains six types of defects: yarn shedding, yarn breakage, yarn belt, cotton ball, holes, and stains. In general, our datasets are challenging, which contain more defects with low contrast with background texture, more small scale defects and more different types of defects.

Evaluation metrics

To evaluate the performance of our proposed method and other saliency models, in this paper, we adopt six popularly-used metrics: precision-recall (PR) curves, F-Measure curves, maximum F-Measure ( $F_{β}$ ), mean absolute error (MAE), structural similarity measure ( $S_{m}$ ), and E-measure( $E_{m}$ ). Binary maps are compared with corresponding ground truths to calculate PR curves, which is the closer to the upper right corner, the better the performance. Specially, binary maps and ground truth are utilized to compute $Precision = TP / (TP + FP)$ which denotes the proportion of salient pixels that are correctly detected, and $Recall = TP / (TP + FN)$ which is the ratio between salient pixels detected and salient pixels in the ground truth, where TP, FP, FN denote true-positive, false-positive and false-negative, respectively. In general, the PR curves represent the average precision and recall rate over saliency maps of all images in the dataset. The pairs of precision and recall are of great importance in many occasions. Therefore, we adopt the F-measure as the overall performance evaluation metric, which is computed by the precision and recall.

F_{β} = \frac{(1 + β^{2}) \times P recision \times Recall}{β^{2} \times Precision + Recall}

(7)

where $β^{2} = 0.3$ to stress on precision more than recall as suggested in Liu et al.⁶⁴ Note that the area is larger under the curve, the performance is better.

MAE⁶⁵ is defined as the average pixel-wise absolute difference between the predicted saliency map S and the binary ground truth L, which is the smaller, the better, it can be expressed as

M A E = \frac{1}{W \times H} \sum_{x = 1}^{W} \sum_{y = 1}^{H} | S (x, y) - L (x, y) |

(8)

where W and H denote the width and height of a given image, respectively. $S (x, y)$ and $L (x, y)$ are the pixel values of the saliency map and the binary ground truth at (x, y), respectively.

S-measure ( $S_{m}$ ) evaluates the spatial structure similarity between the predicted saliency map and its corresponding ground truth, which is defined as

S_{m} = λ \times S_{o} + (1 - λ) \times S_{r}

(9)

where $λ \in [0, 1]$ is the balance parameter, and its value is typically set as 0.5. $S_{o}$ and $S_{r}$ are the objective-aware and region-aware structural similarity, respectively.

E-measure ( $E_{m}$ )⁶⁶ is proposed to evaluate two binary saliency maps, and takes into account both pixel-level and image-level properties simultaneously.

E_{m} = \frac{1}{W_{S} \times H_{S}} \sum_{x = 1}^{W_{S}} \sum_{y = 1}^{H_{S}} θ (x, y)

(10)

where $θ (x, y)$ represents the enhanced alignment matrix.

A better saliency detection model should have a larger F-measure, a larger S-measure, a larger E-measure, and a smaller MAE. In order to make a fair comparison, we report the values of all metrics to comprehensively evaluate the detection performance of all saliency models.

Implementation details

Pre-trained VGG16 on ImageNet is used as backbone network. In the training phase, we randomly initialize the weights of each convolution layers in the network to a standard normal distribution. In addition, in order to avoid too large the weight value that is not conducive to model learning, we multiply the weight by a constant of 0.01, and the biases are initialized to 0. We implement our model on PyTorch platform and train on an NVIDIA V100 GPU for 43 epochs. Finally, the hyper-parameters include: batch size (4), epoch (43), momentum (0.9), weight decay (5e-4), and learning rate (5e-5). The following experiments are conducted with all these parameters fixed. Additionally, the network optimizes the model using stochastic gradient descent (SGD) during training. We apply data augmentation like random flipping and multi-scale input images to alleviate over-fitting risk. All the images are resized into $352 \times 352$ both for training and testing.

Performance comparison with state-of-the-art

We compare the proposed model with 14 previous state-of-the-art salient object detection approaches on our two datasets, including NLDF,⁶⁶ DSS,³⁶ R³Net,⁶³ BASNet,⁶⁷ PiCANet,⁶⁴ GateNet,⁶⁸ PoolNet,⁵⁹ RAS,⁶⁹ AADF,⁷⁰ F³Net,⁷¹ GCPA,⁷² C2FNet⁷³ PSGLoss,⁷⁴ PFNet,⁷⁵ and ICON.⁷⁶ To make fair comparisons, we run public codes with released training models or use implementations with recommended parameters by authors.

Visual comparison

In Figure 5, we present a visual comparison between the proposed method and the other approaches on our own two fabric datasets. Benefiting from the feature pyramid structure of CAMS, our model can detect different defects with different shapes and sizes due to the powerful feature extraction capability. It can be clearly seen that our model not only highlights the defects regions clearly of all kinds of fabric defects, especially the linear defects (see 5th, 7th, 8th, 11th), but also well outlines the boundary of the defects of all sizes, such as tiny defects (see 1st, 2nd, 6th, 9th, 10th, 12th), medium sized defects (see 3rd, 8th) and large defects (see 4th). Specially, as shown in the last row, we can see that some models detect defects with noise (see BASNet, PoolNet, and GateNet) or can not even identify the defects (see NLDF, R³Net, and AADFNet) when faced with tiny defects with low contrast to the background. However, our model can not only effectively suppress redundant information to accurately detect and locate the position of defects, but also outline the defects clearly and completely. Importantly, our model can also achieve excellent detection performance in low contrast occasions (see 7th, 9th, 10th, 11th, 12th), where the fabric defects are blend into the background texture. The reason why our model achieves the superior performance is that our proposed FFR can remove the redundancy, pay more attention on important and salient regions and refine features to force our model learn more useful features for generating more accurate and complete saliency maps. In addition, the MDS can enhance the generality of our model, making it more robust to variations in input images. The advantages we mentioned above make our results more closer to the ground truth, making them superior to other state-of-the-art saliency detection methods.

Figure 5.

Visual comparisons of the proposed method and the state-of-the-art methods.

Quantitative comparison

We compare the quantitative evaluation results with the state-of-the-art saliency detectors on our two fabric test datasets in terms of $F_{β}$ , MAE, $S_{m}$ , and $E_{m}$ in Table 2. It is evident that our method consistently performs better than others on the two fabric datasets, showing its superiority for detecting defects. Especially, our method performs favorably against the others by a significant margin on the plain dataset, where the contrast between the defects and the background texture are low and has more liner and tiny defects. It shows that our method is capable of handling challenging images more effectively. In addition to these four measures ( $F_{β}$ , $S_{m}$ , MAE, and $E_{m}$ ), we also compare the PR curves and F-measure curves of the proposed model and others on our developed two fabric datasets in Figure 6. Comparing all competitors, the PR curve of our approach (red ones) appears to be the most outstanding. We have a much higher precision score than our competitors as recall score approaches 1, demonstrating that our method leads to lower false positives than others. Besides, the F-measure curve is also much higher than other methods with the threshold between [0, 1], which proves the effectiveness of our method. All in all, the experimental results demonstrate that our method is more robust and powerful than other approaches on our fabric datasets (Table 3).

Table 2.

Comparison of different methods on three metrics including $F_{β}$ (higher is better), MAE (lower is better), $S_{m}$ (higher is better), and $E_{m}$ (higher is better).

Methods	Plain dataset				Pattern dataset
Methods	$F_{β} ↑$	$S_{m} ↑$	MAE↓	$E_{m} ↑$	$F_{β} ↑$	$S_{m} ↑$	MAE↓	$E_{m} ↑$
NLDF⁶⁶	0.6095	0.7046	0.0041	0.7838	0.8391	0.8703	0.0105	0.9108
DSS³⁶	0.6006	0.6591	0.0059	0.6963	0.8647	0.8840	0.0108	0.7898
R³Net⁶³	0.6125	0.6287	0.0053	0.7070	0.6331	0.6514	0.0416	0.7798
BASNet⁶⁷	0.6058	0.7078	0.0107	0.7150	0.6787	0.6952	0.1906	0.7714
PiCA⁶⁴	0.5985	0.6634	0.0069	0.4016	0.5939	0.6375	0.1088	0.6039
C2FNet⁷³	0.5820	0.7081	0.0062	0.7011	0.6282	0.8575	0.0329	0.6008
RAS⁶⁹	0.6004	0.7199	0.0049	0.7532	0.8678	0.8951	0.0083	0.9486
F³Net⁷¹	0.6069	0.7253	0.0049	0.7396	0.8619	0.8835	0.0095	0.9025
PoolNet⁵⁹	0.6004	0.6776	0.0053	0.5303	0.7413	0.7942	0.0199	0.6932
GateNet⁶⁸	0.6191	0.6152	0.0057	0.4678	0.7753	0.7964	0.0257	0.7259
AADF⁷⁰	0.6472	0.6745	0.0035	0.7746	0.7963	0.7577	0.0120	0.8955
GCPA⁷²	0.6042	0.6376	0.0084	0.3393	0.8423	0.8673	0.0120	0.8022
PSGLoss⁷⁴	0.5945	0.6714	0.0037	0.7686	0.8627	0.8744	0.0069	0.9569
PFNet⁷⁵	0.6093	0.7195	0.0048	0.7441	0.8937	0.9148	0.0065	0.9553
ICON⁷⁶	0.5889	0.6096	0.0044	0.7375	0.8374	0.8694	0.0106	0.8952
Ours	0.6477	0.7370	0.0039	0.8216	0.8966	0.9101	0.0071	0.9729

The best results are shown in bold.

Table 3.

Results comparison with different backbone networks on plain and pattern datasets.

Backbone	Plain dataset				Pattern dataset
Backbone	$F_{β} ↑$	$S_{m} ↑$	MAE↓	$E_{m} ↑$	$F_{β} ↑$	$S_{m} ↑$	MAE↓	$E_{m} ↑$
VGG13	0.6408	0.7255	0.0045	0.7993	0.8952	0.9025	0.0078	0.9545
ResNet34	0.6380	0.7269	0.0047	0.7773	0.8472	0.8689	0.0298	0.9245
Resnet50	0.6423	0.7312	0.0040	0.8097	0.7151	0.7779	0.0731	0.8202
VGG16 (Ours)	0.6477	0.7370	0.0039	0.8216	0.8966	0.9101	0.0071	0.9729

The best results are shown in bold.

Figure 6.

PR curves and F-measure curves on two fabric datasets.

Model performance

Our network only takes about 0.056 seconds to process a image on a NVIDIA GTX 1080Ti GPU. For fair comparison, all competing methods are reimplemented using the source code provided by the authors with the same hardware configuration. We can see that our model can run at a speed of 18 FPS (FPS: frames per second) when processing a $352 \times 352$ image, which is relatively fast than the other models processing images with same size. Besides, Table 4 also reports the parameters comparison of various models, indicating that our model has the least number parameters and minimal computational resource overhead. When processing images of the same size, unfortunately, our method is slower than real-time saliency models, such as PoolNet,⁵¹ since it aims to improve the accuracy of saliency detection. Therefore, we will consider the model size and running speed of our model as one of the future research directions of our work.

Table 4.

Model performance compared with other models.

Method	Size	Model parameters (MB)	FPS
NLDF⁶⁶	352 × 352	35.48	11
DSS³⁶	256 × 256	62.24	12
PiCANet⁶⁴	512 × 512	47.22	15
R³Net⁶³	256 × 256	56.16	18
GateNet⁶⁸	256 × 256	128.63	22
BASNet⁶⁷	256 × 256	87.06	25
PFNet⁷⁵	352 × 352	46.50	17
Ours	352 × 352	35.55	18

FPS: frames per second.

Ablation study

With different model settings, a series of experiments are conducted in this section to investigate the effectiveness of context-aware multi-scale module (CAMS), feature fusion refinement module (FFR), and multi-level deep supervision (MDS). All models in this section are trained on the augmented fabric datasets and share the same hyper-parameters described in the subsection 4.1. We firstly conduct a series of experiments with different popularly used backbone networks, the results are shown in Table 3. As can be seen that, our selected backbone network has achieved the best results on two fabric datasets. Figure 7 shows the feature maps of different modules, which indicates that all modules are beneficial for performance improvement. Table 5 presents the effectiveness of CAMS and FFR in terms of $F_{β}$ , MAE, $S_{m}$ , $E_{m}$ . From Table 5, as can be seen that our approach achieves best performance with all components (i.e., backbone, CAMS, and FFR), which illustrates that all components of our model are necessary to produce accurate detection results. Moreover, we investigate the effectiveness of the applied multi-level deep supervision (MDS).

Table 5.

Quantitative results for different networks presented in the ablation study on plain dataset.

Baseline	CAMS	FFM	Plain dataset				Pattern dataset
			$F_{β}$	MAE	$S_{m}$	$E_{m}$	$F_{β}$	MAE	$S_{m}$	$E_{m}$
√			0.5359	0.0043	0.6556	0.7624	0.6782	0.0176	0.7489	0.7500
√	√		0.6255	0.0038	0.6967	0.7807	0.8162	0.0112	0.8410	0.8441
√	√	√	0.6477	0.0039	0.7370	0.8216	0.8966	0.0071	0.9101	0.9729

Figure 7.

Visual comparison saliency maps for showing the effectiveness of each model. The first column is the original image, the second column is the ground truth, and the third to last columns are the output results after FPS, CAMS, FFR, respectively.

Effectiveness of CAMS

We introduce the CAMS to expand the receptive field without increasing any computation load, which is conductive to the proposed model to capture more context-aware information for coping with the variation of defects scales. Furthermore, we evaluate the performance of CAMS with respect to different dilation rates $ϑ$ and fusion strategy in each MAFU in order to provide an even more comprehensive analysis. Note that here $ϑ$ denotes a set of dilation rates (Table 6).

Table 6.

The comparisons of parameters $ϑ$ in MFAU on plain dataset in terms of $F_{β}$ , MAE, $S_{m}$ , and $E_{m}$ scores.

$ϑ$	$F_{β}$	MAE	$S_{m}$	$E_{m}$
{1,1,1,1}	0.6487	0.0042	0.7423	0.7891
{1,2,3,4}	0.6370	0.0044	0.7341	0.7867
{1,2,4,6}	0.6455	0.0041	0.7345	0.8094
{1,3,5,7}	0.6477	0.0039	0.7370	0.8216
{2,4,6,8}	0.6455	0.0040	0.7419	0.8197

The best results are shown in bold.

First, we investigate the effectiveness of different dilation rates $ϑ$ utilized in CAMS. Owing to the first convolution layer is designed to retain the spatial information of original feature maps, we set the dilation rate as 1 and only change the dilation rate of the rest convolution layers. We try five settings for $ϑ$ : {1, 1, 1, 1}, {1, 2, 3, 4}, {1, 2, 4, 6}, {1, 3, 5, 7}, and {2, 4, 6, 8}, whose results on plain dataset are reported in Table 4. It can be easily seen that $ϑ = {1, 3, 5, 7}$ achieves the best performance. Although $S_{m}$ score of $ϑ = {1, 1, 1, 1}$ is higher 0.5% than that of $ϑ = {1, 3, 5, 7}$ , $ϑ = {1, 3, 5, 7}$ gains best scores compared with other settings in terms of the rest metrics. As such, we choose $ϑ = {1, 3, 5, 7}$ in all our experiments.

In addition, we study the fusion strategy of MFAU in CAMS. The concatenation operation is a popularly utilized technique for feature aggregation, which is conductive to extend the feature matrix dimension. In this model, we use the concatenation operation to fuse multi-scale multi-receptive-field feature progressively. Figure 8(d) and (h) shows the visual saliency maps after the concatenation operation. It can be clearly seen that the model with the concatenation operation in MFAU can locate fabric defects accurately and depict the defects boundaries clearly. Moreover, all the experimental results presented in this paper testify the effectiveness and superiority of the MAFU with the concatenation operation, which is capable of characterizing the contextual multi-receptive-field features of different levels for fabric defect detection.

Figure 8.

Comparison of saliency maps between different MFAU fusion strategies: (a) image, (b) GT, (c) summation, (d) concatenation, (e) image, (f) GT, (g) summation, and (h) concatenation.

The summation operation is another commonly used feature fusion technique, which is able to fuse different responses of different features while preserving the original information at the same time. We change the concatenation operation to summation operation when performing feature aggregation in MFAU and the feature maps after summation operation are shown in Figure 8(c) and (g). As can be seen in Figure 8, the feature maps through summation operation are with unclear defects boundaries and incomplete defects regions compared with the feature maps through concatenation operation. Furthermore, it is clear from Table 7 that concatenation operation realizes better results compared with summation operation. Therefore, we can make a conclusion that compared to the concatenation operation, the summation operation cannot well enhance the discrimination and representation ability of defects features between different dilation convolution layers in MFAU. With the consideration of the above analysis, we adopt the concatenation operation in this paper to integrate the multi-scale features of MFAU.

Table 7.

$F_{β}$ , MAE, $S_{m}$ , and $E_{m}$ scores on the plain dataset under different fusion strategies in the MFAU.

Variants	$F_{β}$	MAE	$S_{m}$	$E_{m}$
Concatenation	0.6477	0.0039	0.7370	0.8216
Summation	0.6388	0.0041	0.7374	0.7964

The best results are shown in bold.

Effectiveness of FFR

To prove the effectiveness of FFR and its two main components, we compare three variants and the results are reported in Table 6 and Figure 9. From Table 6, it can be seen that after adding AFU and ARU on the basis of CAMS, various indicators scores are also increasing steadily, except for MAE score, which has proved the effectiveness of the two main components of FFR. Especially, we can see from Figure 9, the feature maps generated by FFR are more clear than FFR without ARU, which has more clear boundaries and less texture information. Furthermore, from the fifth column of Figure 7, we can see that the feature maps after FFR are much clear and accurate compared with the first two columns. Therefore, we can make a conclusion that the feature fusion part of our proposed FFR can pay attention important defect regions and effectively aggregate cross-level features, which is conductive to enable the model learn more useful features to further enhance the feature representation capability of our model. Additionally, the refinement part can further refine the features to generate more accurate saliency maps with clear boundaries (Table 8).

Table 8.

$F_{β}$ , MAE, $S_{m}$ , and $E_{m}$ scores on the plain dataset under different variants in the FFR.

Variants	$F_{β}$	MAE	$S_{m}$	$E_{m}$
CAMS	0.6255	0.0038	0.6967	0.7807
CAMS + AFU	0.6422	0.0043	0.7223	0.8003
CAMS + FFR	0.6477	0.0039	0.7370	0.8216

The best results are shown in bold.

Figure 9.

Visual feature maps of AFU with ARU and without ARU: (a) AFU only and (b) AFU + ARU.

Effectiveness of MDS

To demonstrate the effectiveness of adopted multi-level deep supervision (MDS), we compare it with the single-level deep supervision (SDS). Note that only the supervision learning is different and the remaining steps are the same. Compared with the MDS utilizes deep supervision learning for each side output prediction map, SDS means the loss function is only supervised on the last layer. Importantly, MDS means that we use more loss function to supervise and guide our model to locate the defect regions more accurately and learn useful defect features for model training. Here, we show the quantitative results and visualization saliency maps in Table 9 and Figure 10. We can obviously see from Figure 10 that compared to Ours(SDS), the side outputs of four layers obtained by Ours(MDS) are able to locate the defects regions and sketch the defect boundaries well simultaneously. These features generated by Ours(MDS) will be beneficial for better defects localization. In addition, Table 9 presents the experimental results of Ours(SDS) and Ours(MDS) in terms of $F_{β}$ , MAE, $S_{m}$ , and $E_{m}$ scores. Obviously, Ours(MDS) achieves the better results compared with Ours(SDS), which proves the correctness and effectiveness of the MDS.

Table 9.

$F_{β}$ , MAE, $S_{m}$ , and $E_{m}$ scores on the plain dataset under different supervision learning techniques.

Variants	$F_{β}$	MAE	$S_{m}$	$E_{m}$
Ours(SDS)	0.6279	0.0065	0.7069	0.7070
Ours(MDS)	0.6477	0.0039	0.7370	0.8216

The best results are shown in bold.

Figure 10.

Visual feature maps and saliency maps of different supervision learning techniques.

Conclusion

In this paper, we propose a novel end-to-end saliency-based fabric defect detection network named CPA²Net, which is mainly comprised of two components: context-aware multi-scale module (CAMS) and feature fusion refinement module (FFR). CAMS composed of several parallel multi-scale feature aggregation unit (MFAU) with dilated convolution layers is proposed to capture the context-aware multi-scale multi-receptive-field information for coping with the variation of defects scales. To further enhance the discriminative capability of defects features, FFR is proposed to effectively fuse features and refine them, where the AFU with feedback mechanisms is designed to remove redundancy and exploit the complementary information of different layers and the ARU is designed to further refine the features generated by AFU and meanwhile highlight the defects boundaries. Moreover, we adopt a multi-level deep supervision (MDS) to supervise each side output for guiding training and generating more accurate and complete prediction maps. Extensive experimental results demonstrate that our proposed model achieves state-of-the-art performance both in quantitative and qualitative evaluations. More experiments will be conducted in the future to further improve the accuracy and stability of this model or to compress the model for getting a lightweight model while remaining the existing performance of our model. We hope our proposed method could provide promising future research directions in fabric defect detection and other related research fields.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by NSFC (No. 61772576, No.62072489, U1804157), Henan science and technology innovation team (CXTD2017091), IRTSTHN (21IRTSTHN013), 2022 Henan Province Key R&D and Promotion Special Project (Science and Technology Research) Program (222102210008), ZhongYuan Science and Technology Innovation Leading Talent Program (214200510013).

ORCID iD

Zhoufeng Liu

References

Korman

Reichman

Tsur

, et al. Fast-match: fast affine template matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Portland, OR, USA, 23–28 June 2013, pp.2331–2338. New York, NY: IEEE.

Jia

Chen

Liang

, et al. Fabric defect inspection based on lattice segmentation and Gabor filtering. Neurocomputing 2017; 238: 84–102.

Chetverikov

Hanbury

. Finding defects in texture using regularity and local orientation. Pattern Recognit 2002; 35(10): 2165–2180.

Mak

Peng

Yiu

KFC

. Fabric defect detection using morphological filters. Image Vis Comput 2009; 27(10): 1585–1592.

Zuo

Wang

Yang

, et al. Fabric defect detection based on texture enhancement. In: 2012 5th international congress on image and signal processing, Chongqing, China, 16–18 October 2012, pp. 876–880. New York, NY: IEEE.

Gan

Yang

. Texture enhancement though multiscale mask based on RL fractional differential. In: 2010 international conference on information, networking and automation (ICINA), Kunming, 18–19 October 2010, vol. 1, pp.V1–333–V1–337. New York, NY: IEEE.

Chan

Pang

. Fabric defect detection by Fourier analysis. IEEE Trans Ind Appl 2000; 36(5): 1267–1276.

Kumar

Pang

. Defect detection in textured materials using Gabor filters. IEEE Trans Ind Appl 2002; 38(2): 425–440.

Mak

Peng

. Detecting defects in textile fabrics with optimal Gabor filters. Int J Comput Sci 2006; 1(4): 274–282.

10.

Wen

Cao

Liu

, et al. Fabric defects detection using adaptive wavelets. Int J Cloth Sci Technol 2014; 26(3): 202-211.

11.

Liu

. Fabric defect detection based on low rank decomposition with structural constraints. Vis Comput 2022; 38(2): 639–653.

12.

Bao

Liang

Xia

, et al. Low-rank decomposition fabric defect detection based on prior and total variation regularization. Vis Comput 2021; 38: 2707–272.

13.

Itti

Koch

Niebur

. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 1998; 20(11): 1254–1259.

14.

Liu

Wang

Liu

, et al. Saliency weighted RX hyperspectral imagery anomaly detection. J Remote Sens 2019; 23(3): 418–430.

15.

Shi

, et al. A multiscale dilated dense convolutional network for saliency prediction with instance-level attention competition. J Vis Commun Image Representation 2019; 64: 102611.

16.

Yuan

. Real-time classification of wood based, on visual significance. yi qi yi, biao xue bao. Chin J Sci Instrum 2018; 39(12): 237–244.

17.

Liu

Zhao

, et al. Fabric defect detection algorithm using local statistic features and global saliency analysis. J Text Res 2014; 35(11): 62–67.

18.

Wan

Deng

, et al. Fabric defect detection based on saliency histogram features. Comput Intell 2019; 35(3): 517–534.

19.

Yang

Liu

, et al. Fabric defect detection via learned dictionary-based visual saliency. Int J Cloth Sci Technol 2016; 28: 530–542.

20.

Yuan

Kim

, et al. Dense and sparse labeling with multidimensional features for saliency detection. IEEE Trans Circuits Syst Video Technol 2016; 28(5): 1130–1143.

21.

Wang

Yang

, et al. Salient region detection via discriminative dictionary learning and joint Bayesian inference. IEEE Trans Circuits Syst Video Technol 2017; 28(5): 1116–1129.

22.

Liu

Fan

. A model of visual attention for natural image retrieval. In: 2013 international conference on information science and cloud computing companion, Guangzhou, China, 7–8 December 2013, pp. 728–733. New York, NY: IEEE.

23.

Guo

Zhang

. A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans Image Process 2009; 19(1): 185–198.

24.

Siagian

Itti

. Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Trans Pattern Anal Mach Intell 2007; 29(2): 300–312.

25.

Rutishauser

Walther

Koch

, et al. Is bottom-up attention useful for object recognition? In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004, 27 June 2004–02 July 2004, Washington, DC, USA, vol. 2, pp.II-II. New York, NY: IEEE.

26.

Wei

Hao

Tang

, et al. A new method using the convolutional neural network with compressive sensing for fabric defect classification based on small sample sizes. Text Res J 2019; 89(17): 3539–3555.

27.

Zhang

Lee

. Automatic fabric defect detection with a wide-and-compact network. Neurocomputing 2019; 329: 329–338.

28.

Yapi

Mejri

Allili

, et al. A learningbased approach for automatic defect detection in textile images. IFAC-PapersOnLine 2015; 48(3): 2423–2428.

29.

Ouyang

Hou

, et al. Fabric defect detection using activation layer embedded convolutional neural network. IEEE Access 2019; 7: 70130–70140.

30.

Yapi

Allili

Baaziz

. Automatic fabric defect detection using learning-based local textural distributions in the contourlet domain. IEEE Trans Autom Sci Eng 2017; 15(3): 1014–1026.

31.

Jing

Zhang

. Automatic fabric defect detection using a deep convolutional neural network. Color Technol 2019; 135(3): 213–223.

32.

Jing

. Yarn-dyed fabric defect detection based on convolutional neural network. In: Tenth international conference on graphics and image processing (ICGIP 2018), Enshi, China, 25–27 May 2018, vol. 11069, pp.1128–1133. New York, NY: IEEE.

33.

Wei

Hao

Tang

, et al. Fabric defect detection based on faster RCNN. In: International conference on artificial intelligence on textile and apparel, shanghai, 25–27 November 2019, pp.45–51. Cham: Springer.

34.

Liu

, et al. Fabric defects detection based on SSD. In: Proceedings of the 2nd international conference on graphics and signal processing, Sydney, NSW, Australia, 6–8 October 2018, pp.74–78. New York, NY: ACM.

35.

Hou

Zhou

Feng

. Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Nashville, TN, USA, 20–25 June 2021, pp.13713–13722. New York, NY: IEEE.

36.

Hou

Cheng

, et al. Deeply supervised salient object detection with short connections. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 21–26 July 2017, pp.3203–3212. New York, NY: IEEE.

37.

Pan

Liu

, et al. Stacked u-shape network with channel-wise attention for salient object detection. IEEE Trans Multimedia 2020; 23: 1397–1409.

38.

Hamdi

Sayed

Fouad

, et al. Fully automated approach for patterned fabric defect detection. In: 2016 fourth international Japan-Egypt conference on electronics, communications and computers (JECECC), Cairo, 31 May 2016–02 June 2016, pp.48–51. New York, NY: IEEE.

39.

Zhang

Bresee

. Fabric defect detection and classification using image analysis. Text Res J 1995; 65(1): 1–9.

40.

Wang

Zhang

. Unsupervised defect detection in textiles based on Fourier analysis and wavelet shrinkage. Appl Optics 2015; 54(10): 2963–2980.

41.

Cui

Xie

. Application of gaussian mixture model on defect detection of print fabric. J Text Res 2015; 36(8): 94–98.

42.

Cohen

Fan

Attali

. Automated inspection of textile fabrics using textural models. IEEE Trans Pattern Anal Mach Intell 1991; 13(8): 803–808.

43.

Ayoub

Gao

Chen

, et al. Visual saliency detection based on color frequency features under Bayesian framework. KSII Trans Internet Information Syst (TIIS) 2018; 12(2): 676–692.

44.

Liu

Zhao

, et al. A fabric defect detection algorithm via context-based local texture saliency analysis. Int J Clothing Sci Technol 2015; 27: 738–750.

45.

Xie

Zhang

. Fabric defect detection method combing image pyramid and direction template. IEEE Access 2019; 7: 182320–182334.

46.

Girshick

Donahue

Darrell

, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, 23–28 June 2014, pp. 580–587. New York, NY: IEEE.

47.

Jing

Yang

, et al. Supervised defect detection on textile fabrics via optimal Gabor filter. J Industr Text 2014; 44(1): 40–57.

48.

Tang

Wang

Zhang

, et al. Salient object detection via weighted low rank matrix recovery. IEEE Signal Process Lett 2016; 24(4): 490–494.

49.

Zhang

Jiang

Yao

. Segmentation of jacquard warp-knitted fabric image based on hierarchical Markov random field model. J Text Res 2012; 33(12): 102–106.

50.

Mottalib

Rokonuzzaman

Habib

, et al. Fabric defect classification with geometric features using Bayesian classifier. In: 2015 international conference on advances in electrical engineering (ICAEE), Dhaka, Bangladesh, 17–19 December 2015, pp.137–140. New York, NY: IEEE.

51.

Zhang

. Fabric defect detection based on visual saliency map and SVM. In: 2017 2nd IEEE international conference on computational intelligence and applications (ICCIA), Beijing, China, 8–11 September 2017, pp.322–326. New York, NY: IEEE.

52.

Liu

, et al. Fabric defect detection based on faster R-CNN. In: Ninth international conference on graphic and image processing (ICGIP 2017), Qingdao, China, 10 April 2018, vol. 10615, pp.55–63. Bellingham: SPIE.

53.

Wang

Liu

, et al. Self-attention deep saliency network for fabric defect detection. In: International conference on bio-inspired computing: theories and applications, Qingdao, China, 23–25 October 2020, pp.627–637. Cham: Springer.

54.

Liu

Wang

, et al. Fabric defect detection based on visual saliency using deep feature and lowrank recovery. In: Ninth international conference on graphic and image processing (ICGIP 2017), Qingdao, China, 1 April 2018, vol. 10615, pp.36–43. Bellingham: SPIE.

55.

Simonyan

Zisserman

. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556 2014.

56.

Zhang

Ren

, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27–30 June 2016, pp.770–778. New York, NY: IEEE.

57.

Yang

Zhang

, et al. Feature pyramid and hierarchical boosting network for pavement crack detection. IEEE Trans Intell Transp Syst 2019; 21(4): 1525–1535.

58.

Dong

Song

, et al. Pga-net: pyramid feature fusion and global context attention network for automated surface defect detection. IEEE Trans Industr Inform 2019; 16(12): 7448–7458.

59.

Liu

Hou

Cheng

, et al. A simple pooling-based design for real-time salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Long Beach, CA, USA, 15–20 June 2019, pp.3917–3926. New York, NY: IEEE.

60.

Ioffe

Szegedy

Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, Lille, France, 6–11 July 2011, pp.448–456. New York, NY: ACM.

61.

Nair

Hinton

GE.

Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), Haifa, Israel, 21–24 June 2010, pp.807–814. New York, NY: ACM.

62.

Xie

Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, Santiago, Chile, 7–13 December 2015, pp.1395–1403. New York, NY: IEEE.

63.

Deng

Zhu

, et al. R3net: recurrent residual refinement network for saliency detection. In: Proceedings of the 27th international joint conference on artificial intelligence, Stockholm, Sweden, 13–19 July 2018, pp.684–690. Menlo Park, CA: AAAI Press.

64.

Liu

Han

Yang

. Picanet: learning pixelwise contextual attention for saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Colorado Springs, CO, USA, 20–25 June 2011, pp.3089–3098. New York, NY: IEEE.

65.

Fan

Gong

Cao

, et al. Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:180510421 2018.

66.

Luo

Mishra

Achkar

, et al. Non-local deep features for salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017, pp. 6609–6617. New York, NY: IEEE.

67.

Qin

Zhang

Huang

, et al. Basnet: boundary-aware salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Long Beach, CA, USA, 15–20 June 2019, pp. 7479–7489. New York, NY: IEEE.

68.

Zhao

Pang

Zhang

, et al. Suppress and balance: a simple gated network for salient object detection. In: European conference on computer vision, Glasgow, United Kingdom, 23 August 2020, pp.35–51. Cham: Springer.

69.

Chen

Tan

Wang

, et al. Reverse attention-based residual network for salient object detection. IEEE Trans Image Process 2020; 29: 3763–3776.

70.

Zhu

Chen

, et al. Aggregating attentional dilated features for salient object detection. IEEE Trans Circuits Syst Video Technol 2019; 30(10): 3358–3371.

71.

Wei

Wang

Huang

. F³net: fusion, feedback and focus for salient object detection. Proc AAAI Conf Artif Intell 2020; 34: 12321–12328.

72.

Chen

Cong

, et al. Global context-aware progressive aggregation network for salient object detection. Proc AAAI Conf Artif Intell 2020; 34: 10599–10606.

73.

Sun

Chen

Zhou

, et al. Context-aware cross-level fusion network for camouflaged object detection. arXiv preprint arXiv:2105.12555, 2021.

74.

Yang

Lin

, et al. Progressive self-guided loss for salient object detection. IEEE Trans Image Process 2021; 30: 8426–8438.

75.

Mei

Wei

, et al. Camouflaged object segmentation with distraction mining. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Nashville, TN, USA, 20–25 June 2021, pp.8772–8781. New York, NY: IEEE.

76.

Zhuge

Fan

Liu

, et al. Salient object detection via integrity learning. IEEE Trans Pattern Anal Mach Intell 2022; 45(3): 3738–3752.