Sage Journals: Discover world-class research

Abstract

The fabric defect detection algorithm based on object detection has become a research hotspot. The method based on the Single Shot MultiBox Detector (SSD) model has a fast detection speed, but the detection accuracy is insufficient. To balance the detection speed and accuracy of the model and meet the actual needs of the industry, an improved fabric defect detection algorithm based on SSD is proposed in this study. The Fully Convolutional Squeeze-and-Excitation (FCSE) block is added into the traditional SSD to improve the detection accuracy of the model. The number of default boxes was adjusted to accommodate the detection of long strip defects on fabric surface. Experimental results on the TILDA and Xuelang dataset confirm that our detection method based on SSD efficiently detected various fabric defects.

Keywords

Computer Vision Fabric Defects FCSE Object Detection SSD

Introduction

Fabric defect detection plays an irreplaceable role in the textile production line. However, in many textile industries, defect detection is still relying on manual inspection. Inspectors can easily find defects in fabrics by direct observation,¹ but prolonged observation can easily fatigue human eyes and lead to an increasing number of inadvertently missed defects. To meet the needs of modern industry, it is important to develop a fabric defect detection method based on computer vision.

Since the 1980s, many scholars have done a lot of research on fabric defect detection. The algorithms of fabric defect detection are mainly divided into the following five categories: 1) statistically-based methods (e.g., typical gray level co-occurrence matrix),² mathematical morphology methods,³ 2) spectral-based methods, mainly including Fourier⁴ and Gabor transforms,⁵ 3) model-based methods, such as the autoregressive model,⁶ the Markov stochastic model,⁷ 4) the structural-based method,⁸ and 5) learning-based methods (e.g., neural networks^9–14 and the support vector machine¹⁵).

Recently, various optimization methods are emerging,¹⁶ and the performance of learning methods based on convolutional neural networks (CNN)¹⁷ have been greatly improved. Rebhi et al.¹⁰ uses feed-forward neural networks for classification of image blocks after discrete cosine transform. Gao et al.¹¹ proposes a CNN with multi-convolution and max-pooling layers to detect defects in woven fabrics with a solid color. Jing et al.¹² and Ouyang et al.¹³ mainly use public fabric data sets and the images collected from textile industries to train the model and can get good detection results. Xie et al.¹⁴ applied the unsupervised model SDCAE into fabric defect detection. The method combines image pyramid and direction template in the model, which improves the detection accuracy on solid color and periodic texture fabrics.

As one of the most fundamental and challenging problems in computer vision, object detection has been widely applied in various industrial fields, such as surveillance,¹⁸ autonomous driving,¹⁹ and the surface defect detection of magnetic tiles.²⁰ Existing object detectors usually can be divided into two categories:²¹ one is the two-stage detector (e.g., Faster R-CNN²² and Mask R-CNN²³), and the other is the one-stage detector (e.g., You Only Look Once (YOLO),^24–26 Single Shot MultiBox Detector (SSD),²⁷ and RetinaNet.²⁸ The two stages of two-stage detectors can be divided by the ROI (Region of Interest) pooling layer. For example, in Faster R-CNN, the first stage, called a Region Proposal Network (RPN), proposes candidate object bounding boxes. In the second stage, features are extracted by an RoIPool (ROI Pooling) operation from each candidate box for the following classification and bounding-box regression tasks.²³ Furthermore, the one-stage detectors propose predicted boxes from input images directly without the region proposal step, thus they are time efficient and can be used for real-time detection. Particularly, SSD had a competitive result on both mean average precision (mAP) and speed with the VGG-16²⁹ backbone. SSD achieved an mAP of 81.6% on the PASCAL VOC 2007 test set and 80.0% on the PASCAL VOC 2012 test set as compared to Faster R-CNN (78.8%, 75.9%) and YOLO (VOC2012: 57.9%).²⁷ On the MS COCO DET dataset, SSD512 was better than Faster R-CNN under all evaluation criteria.

In a recent study, more and more object detection models based on CNN have been applied in the field of fabric defect detection and have achieved remarkable results. Liu et al.³⁰ use the two-stage detector Faster R-CNN as the main model, and use many data enhancement methods, such as image cropping and rotating, and noise adding to alleviate the problem of overfitting. Liu et al.³¹ propose a texture defect detection method based on Faster R-CNN and feature fusion—before the fully-connected layer, they combine Histogram of Oriented Gradients (HOG) features after dimension reduction with candidate object bounding boxes features. Liu et al.³² introduce the single-stage detector SSD to the defect model for the first time, and added the third-level feature con3_3 to the feature pyramid to achieve the detection of small objects. Zhang et al.³³ compared the three variant models of YOLO V2 and selected the best model for detection. Among various fabric defect detection algorithms based on the object detection model, two-stage detectors usually have high localization and object recognition accuracy, and the detection speed is not as fast as one-stage detector. However, the one-stage detectors achieve high detection speed and a slightly lower detection accuracy.³⁴

To balance the detection speed and accuracy of the model and meet the actual needs of the industry, an improved fabric defect detection model based on SSD is proposed in this work. The Fully Convolutional Squeeze-and-Excitation (FCSE) block, an improved module the in Squeeze-and-Excitation network,³⁵ is added into the traditional SSD to increase the weight of the feature map channel where the defective region is located. Under the condition of ensuring the high-speed detection of the SSD model, the detection accuracy of the model is improved. In addition, we adjusted the number of aspect ratio (default boxes) on each feature map to adapt to long strip defects. To reduce the influence of over-fitting in the training stage, it is necessary to do data enhancement. The experimental results show that our model achieves better detection results on the TILDA dataset,³⁶ reaching 25.6 frames per second (FPS) on the test set, and the detection accuracy of the defect area achieves 63.7% mAP. In addition, on the Xuelang dataset,³⁷ our model achieves 47.1% mAP and 19.5 FPS, which shows that the proposed model can detect defects on the fabric surface when there are some difficult objects in images.

The remainder of this paper is organized as follows. An overview of the traditional SSD network is followed by the network architecture of our fabric defect detection method based on SSD. Next comes the experimental results and analysis, followed by the conclusion.

Overview of SSD

When Faster R-CNN was proposed, there were many variations of models to improve the selection method of candidate regions. Liu et al. proposed an SSD net to select the default boxes by building a multi-scale feature map. As shown in Fig. 1, the network architecture of the SSD net includes the basic feature layer, extra feature layer, convolutional predictors for detection (called “head”), and non-maximum suppression (NMS).

Fig. 1

The network architecture of SSD.

Basic Feature and Extra Feature Layers

The basic feature extraction layer of SSD uses the VGG16 pre-training model. It converts fc6 and fc7 to convolutional layers, changes pool5 from 2x2-s2 to 3x3-s1, and removes all the dropout layers and the fc8 layer.

After the basic feature extraction layer, SSD adds eight additional convolution layers, named conv8_1, conv8_2, conv9_1, conv9_2, conv10_1, conv10_2, conv11_1, and conv11_2 as the extra feature extraction layer.

Multi-Scale Feature Map and Default Box

SSD uses conv4_3, conv7 (fc7), conv8_2, conv9_2, conv10_2, and conv11_2 to predict both location and confidences, namely, after passing through the convolution predictor (“head”), these six-layer feature maps are used to generate default boxes with predicted categories and location values.

The convolutional predictors (“head”) is a sub-network for regression and classification consisting of two parallel convolution layers. The output size of the “head” is m*n*(kx(cls+4)). Among them, m and n are the width and height of the feature maps, k is the number of aspect ratios in each layer, and cls+4 represents the number of the category and four offset values of location. As shown in Fig. 2, the k of each “head” structure of the original SSD model is set at 4, 6, 6, 4, 4, and 4. A total of 8732 default boxes with cls+4 predicted values are generated.

Fig. 2

Our network architecture.

In the testing stage, the positive default box generated by the SSD is filtered by NMS to get the final detection result.

MultiBox Loss

In the training process, SSD first needs to use a matching strategy to label the default boxes as follows. First, it matches each ground truth with the best Jaccard overlap, and for the remaining default box, it matches default boxes to any ground truth with a Jaccard overlap higher than a threshold (0.5). Then, SSD uses hard negatives mining to filter negative class boxes, and control the ratio of positive and negative samples to 1:3 to solve the problem of unbalanced sample categories. Finally, the label and predicted value of these default boxes are used as the input of the loss function to calculate the loss value, reverse propagation, and update the weight.

Similar to the fast R-CNN, the general objective loss function is referred to in Eq. 1.

L (x, c, l, g) = \frac{1}{N} (L_{c o n f} (x, c) + α L_{l o c} (x, l, g))

Eq. 1

N is the number of positive samples of the matched default box, x is the category and location values of prediction boxes, l is the offset value of the prediction box, g is the offset value of the label, c is the multi-class probability value generated by softmax classifier, L_loc is smooth L1 loss, L_conf is the cross-entropy function, and α is the weighting coefficient of classification loss (confidence loss) and location loss, usually set to 1.

Improved Fabric Defect Detection Method Based on SSD

In this section, we introduce our fabric defect detection algorithm based on SSD. First, the overall structure of the model is presented, then the improvement of the structure is analyzed according to the sequence of the training process, and some specific details of implementation are given.

Network Architecture

As shown in Fig. 2, the model includes the VGG16 base feature layer, extra feature extraction layer and convolutional predictors for detect (a “head”)1., The input image of the network is set to 300*300 pixels, conv1_1, conv1_2, …, conv6, and conv7 are base feature layers of the original SSD, and conv8_1, conv8_2, …, conv11_1, and conv11_2 are extra feature layers. Six feature maps of different scales, conv4_3, conv7, conv8_2, conv9_2, conv10_2, and conv11_2, are selected to get the predicted classification results of each default box and four offset values of the location. Compared with the original SSD model structure, the structure in this study has the following improvements:

The aspect ratio number is increased to adapt to the long strip defect (larger aspect ratio), and the convolution layer of the “head” substructure is adjusted accordingly.

By introducing the SEnet idea, we add a channel attention mechanism, which is the FCSE block, in the last eight layers of the convolution layer.

Aspect Ratio

In the process of default box generation, we also select a six-layer feature map to generate the default boxes. Because there are strip defects in dataset images, it is necessary to add 4 and 1/4 ratios in aspect ratios of some layers, so that the model can accurately locate such defects. As shown in the green feature maps of Fig. 2, the output channels of the six “head” structures are adjusted accordingly. The values of k of the six-layer feature map are set to 4, 6, 8, 8, 6, 4 respectively, and a total of 9000 default boxes are generated.

The default boxes are generated in the following way. For each level of the feature map, we need to determine a scale using Eq. 2.

s_{k} = s_{\min} + \frac{s_{\max} - s_{\min}}{m - 1} (k - 1) k \in [1, 6]

Eq. 2

and aspect ratio $a_{r} \in {1, 2, \frac{1}{2}, 3, \frac{1}{3}, 4, \frac{1}{4}}$ . Each pixel of the k-level feature map generates seven default boxes, the width of which is $s_{k} \times \sqrt{a_{r}}$ and the height of which is $s_{k} / \sqrt{a_{r}}$ , and an additional default box with the size of $\sqrt{S_{k} S_{k + 1}} \times \sqrt{S_{k} S_{k + 1}}$ is added.

Only ${1, 2, \frac{1}{2}}$ is reserved for a_r of conv4_3 and conv11_2 on, the TILDA dataset, and ${1, 2, \frac{1}{2}, 3, \frac{1}{3}}$ is reserved for conv7 and conv10_2, so the six-layer feature map generated 5776, 2166, 800, 200, 54, and 4 boxes, in turn, a total of 9000 default boxes. The “head” generates the values of cls categories and four offsets for each default box.

FCSE Block

SEnet adds the Squeeze-and-Excitation operation after the convolution layer to determine the weight of the feature map channel and enhances the weight of channels that play an active role in the feature map so that the training mo del can get a better effect.

To improve the sensitivity of the model to information features, the SE block is added after all convolution layers of extra feature layers. As shown in Fig. 3, SE Block consists of Squeeze and Excitation.

Fig. 3

FCSE block.

The output of a convolution layer is a feature map of W*H*C size, and the Squeeze-and-Excitation operations are performed.

Squeeze: The output X of the convolutional layer is a feature map of C*W*H size, which is denoted as

$X = [x_{1}, x_{2}, \dots, x_{c}]$ , a and averages value

$Z = [z_{1}, z_{2}, \dots, z_{c}]$ , of each channel image is obtained

by the Squeeze operation. Squeeze operation is referred to as in Eq. 3.

z_{c} = F_{s q} (x_{c}) = \frac{1}{W \times H} {\sum^{​}}_{H}^{i = 1} {\sum^{​}}_{W}^{j = 1} x_{c} (i, j)

Eq. 3

Excitation: After the Squeeze operation, an Excitation operation consisting of two full-connection layers is needed to obtain the dependencies of all channels. Excitation is referred to as in Eq. 4.

S = F_{e x} (Z, W) = σ_{2} (W_{2} σ_{1} (W_{1} Z))

Eq. 4

W₁ is the full-connection layer parameter of (C/R)*C, W₂ is the full-connection layer parameter of C*(C/R), σ₁ (•) is the Relu activation function and σ2 (•) is the Sigmoid activation function.

To reduce the parameters of the model, two convolution layers conv1 (kernel size:1*1 out channel: C/R) and conv2 (kernel size:1*1 out channel were used: C) are used to replace the above two full-connection layers to form the FCSE block.

Output: The out put S of the SE branch is applied to X, and then we can get the result from Eq. 5.

\begin{array}{l} Y = [Y_{1}, Y_{2}, \dots, Y_{c}] \\ Y_{c} = S_{c} * X_{c} \end{array}

Eq. 5

Loss Function and Training

The loss function of our model uses MultiBox loss, and we train the resulting model using Stochastic Gradient Descent (SGD) with the initial learning rate 4 x 10^-4, 0.9 momentum, 0.0005 weight decay, and 16 batch size. After 120 k iterations, the model converges and the weight training is completed.

Experiment Design and Results

Data Creation

In this section, the performance of the proposed defect detection model is verified by the TILDA and Xuelang dataset, and the experimental environment consists of a LZ-748GT workstation configured with an Intel E5-2600 CPU (2200 MHz), 32GB RAM, and a Nvidia 16GB TITAN XP GPU. The software environment consists of Python 3.6, pytorch-GPU.version 0.4, and CUDA 9.0.

TILDA Dataset

TILDA is a textile texture database that was developed within the framework of the working group Texture Analysis of the DFG's (Deutsche Forschungsgem einschaft) major research program “Automatic Visual Inspection of Technical Objects.” As shown in Figs. 4a-h, the dataset contains a total of eight representative textiles. (a)-(d) are pure-color fabrics, (e) and (g) are periodic pat tern fabrics, and (f) and (h) are fabrics with a motif pattern. In addition, the dataset contains four defects, (a) and (b) are E1 (hole), (c) and (d) are E2 (spot), (e) and (f) are E3 (wire), and (g) and (h) are E4 (dark thread). There are 50 no n-defect samples and 350 defect samples in each fabric background (8 kinds in total). The size of each picture is 768*512.

Fig. 4.

Partial images of TILDA dataset.

Xuelang Manufacturing AI Challenge Contest Dataset

The 2018 Xuelang Manufacturing AI Challenge was jointly sponsored by the Jiangsu Wuxi Economic Development Zone (Taihu New Town) and Alibaba Cloud Computing Co. Ltd. The contest is based on the Alibaba Cloud Tianchi platform, which provides thousands of finely labeled cloth sample data. The dataset of the contest (called the “Xuelang dataset”) covers all kinds of important defects of pure-color fabrics in the textile industry. The data consists of two parts: the original image and the defect annotation data. As shown in Figs. 5a–f, the dataset contains three kinds of common defect images. (a) and (b) are “clip mark” defects (Defectl), (c) and (d) are “tight end” defects (Defect2), and (e) and (f) are “crackiness” defects (Defect3). The size of each image is 2560*1920.

Fig. 5

Partial images of Xuelang dataset.

Performance Metrics

In this work, we use precision (P), recall (R), F₁ - measure,^13,38,39 mAP,²⁷ and FPS²⁵ to evaluate the performance of the different methods as defined in Eqs. 6–8.

P = \frac{T P}{T P + F P}

Eq. 6

R = \frac{T P}{T P + F N}

Eq. 7

F_{1} - m e a s u r e = \frac{2 \times P \times R}{P + R}

Eq. 8

TP refers to the number of defective objects that are correctly detected as predicted boxes, FN refers to the number of defective objects that are falsely detected as other categories predicted boxes or background, and FP refers to the number of background falsely detected as predicted boxes. P refers to the proportion of correctly detected defective objects in all predicted boxes. R refers to the proportion of correctly detected defective objects in all ground truth. The F₁ - measure indicator is a comprehensive evaluator that uses both the recall and precision indicators.

The AP indicator is defined as $AP = \int_{0}^{1} P (R) d R$ . It is the main object detection model evaluation indicator, and it refers to the area enclosed by the P-R curve and the R axis. mAP is the average value AP values in all categories. FPS is used to evaluate the detection speed of object detection model.

Analysis of TILDA Dataset

To evaluate the effectiveness of the proposed algorithm with pure-color, periodic pattern, and motif fabrics, the TILDA dataset is used. In the TILDA dataset, there are 200 images without defects and 1400 images with defects. To reduce the influence of overfitting, it is necessary to do data enhancement. We mainly augment the TILDA dataset with random rotation by 90, 180, and 270 degrees. The original data set is 1600, the training data set is 1280, and the test data set is 320. After image rotating, the number of TILDA dataset images is expanded to 2560.

The performance of the proposed method is compared with three well-known object detection models, including Faster R-CNN, YOLO V3, and the original SSD with respect to the indicators mentioned previously (mAP, P, R, F₁ – measure, and FPS). As is shown in Table I, on the TILDA dataset, the proposed method mAP is 4.8% and 30.4% higher than Faster R-CNN and YOLO V3, respectively, and 3.3% higher than the original SSD model. For the comprehensive indicator F₁ – measure, the proposed method is 7.9%, 21.6%, and 3.2% higher than Faster R-CNN, YOLO V3, and the original SSD model, respectively, which shows our method can get a good trade-off between P and R indicators. At the same time, the test time reaches 25.6 FPS, which meets the requirements of real-time detection (≥ 24 FPS). Experimental results on the TILDA dataset confirm that our detection method based on SSD can efficiently detect hole, spot, wire, and dark thread defects on the fabrics with pure-color and motif pattern in real-time, and the location accuracy of defect area achieves 63.7% mAP. Under the condition of ensuring the high-speed detection of the SSD model, the detection accuracy of the model is improved. To a certain extent, the proposed method balances the detection speed and detection accuracy.

We also carried out ablation experiments and set up two contrast experiments: 1) the original SSD, 2) original SSD plus {4, 1/4} default boxes, and 3) SSD plus FCSE block. As shown in Table II, our method is better. Adding the FCSE block on extra feature layer can increase mAP by 2.4% compared with the original SSD model, and adding aspect{4, 1/4} mAP can increase it by 1.8%. The results of ablation experiment show that the FCSE block can increase the weight of the feature map channel where the defective region is located, and this “channel attentional mechanism” can improve the detection accuracy of the proposed model.

Table I. Performance Comparison of Different Detection Approaches on TILDA Dataset

Method	AP (%)				mAP (%)	P (%)	R (%)	- (%)	Run Time (FPS)
Method	E1	E2	E3	E4	mAP (%)	P (%)	R (%)	- (%)	Run Time (FPS)
Faster-R-CNN	66.8	62.9	54.8	51.3	58.9	65.9	55.8	60.4	11.1
SSD	63.6	63.3	53	61.6	60.4	67.5	63.0	65.1	33.3
YOLO V3	45.5	46.4	16.4	24.9	33.3	59.7	38.4	46.7	19.8
Ours	65.7	63.1	62.6	63.3	63.7	71.8	65.2	68.3	25.6

Adding aspect {4, 1/4} can enhance the ability of the model to detect long strip defects, and the AP in the defects, E3 (wire) and E4 (dark thread) can be improved. As is shown in Fig. 6, our method achieves low loss during the model training. After 120 k iterations, the total loss value, classification (confidence) loss and location (regression) loss gradually decreased until on vergence, and the model training was completed. The minimum loss of total loss is 1.80. Some of the results of our method on the TILDA dataset are shown in Fig. 7.

Fig. 6

The results of loss in our method.

Fig. 7

The detection results of our method.

Table II. The Ablation Experiment using Our Method

Method	mAP (%)
SSD	60.4
SSD include{4,1/4}	62.2
SSD+SE	62.8
Ours	63.7

Fig. 8

Sketch of partial magnification of the image defect area (Due to the larger image resolution (2560*1920) and the smaller defect area, we only show the local area of the original image where the defects lie). (a) and (b) are images with “clip mark” defects, (c) is an image with a “tight end” defect, and (d) is an image with a “crackiness” defect.

Analysis of Xuelang Dataset

To verify the detection performance of the proposed model in actual scenarios, we chose the Xuelang dataset as a supplementary experiment. On the Xuelang dataset, there are 307 images with defects, including 122 Defect 1, 133 Defect 2, and 52 Defect 3 images. To reduce the impact of overfitting, it is necessary to do data enhancement. We use image cropping, image rotation, image brightness changes, image horizontal flip, and vertical flip to enhance the data. Taking into account the balance of the number of images between each defect class, we expanded the number of Defect 1, Defect 2, and Defect 3 images to 3, 3, and 7 times, respectively. That is, after enhancement, the number of Defect 1, Defect 2, and Defect 3 images is 366, 399, and 364 respectively, and the total number of the Xuelang dataset images is expanded to 1129.

As shown in Table III, on the Xuelang dataset, the proposed method mAP indicator is 14.1%, 3.6%, and 12.9% higher than Faster R-CNN, the original SSD, and YOLO V3 respectively. For the comprehensive indicator F₁ – measure, the proposed method is 15.8%, 13.8%, and 5.3% higher than Faster R-CNN, YOLO V3, and the original SSD model respectively. The test time can also reach a high speed of 19.5 FPS. Adding aspect {4, 1/4} can enhance the ability of the model to detect long strip defects, and the AP in the defects, Defect2 (tight end), and Defect3 (crackiness) can be improved (increased by 10.2%, 2.7% mAP compared with the original SSD). The experimental results on the Xuelang dataset show that the proposed method has better detection results.

Table III. Performance Comparison of Various Detection Approaches on Xuelang Dataset

Method	Defect 1	Defect 2	Defect 3	mAP (%)	P (%)	R (%)	-(%)	Run Time (FPS)
Method	AP (%)			mAP (%)	P (%)	R (%)	-(%)	Run Time (FPS)
Faster-R-CNN	36.3	21.8	40.9	33.0	48.9	37.5	42.5	6.4
SSD	42.8	45.2	42.7	43.5	57.0	49.6	53.0	21.4
YOLO V3	38.0	29.3	35.2	34.2	46.8	42.5	44.5	15.6
Ours	40.6	55.4	45.4	47.1	64.2	53.4	58.3	19.5

Compared with the TILDA dataset, the Xuelang dataset, as a contest dataset, contains more images with small and difficult objects. For example, as shown in Fig. 5a, the pixels about (290*260) of the defective area is less than 5% of the total pixels of the image (2560*1960). As shown in Figs. 8a–d, the images in the Xuelang dataset are mainly based on pure-color texture background, and the color of the defect area is similar to the texture, which affects the detection accuracy of the model to a certain extent. Our model achieves 47.1% mAP and 19.5 FPS. Some of the results of our method on the Xuelang dataset are shown in Fig. 9, which shows that when there are some difficult object on images, the proposed model can detect defects on the fabric surface.

Fig. 9

The detection results of our method on the Xuelang dataset. (a) and (b) are the detection result of images with Defect1, (c) and (d) with Defect2, and (e) and (f) with Defect3.

In summary, our detection method based on SSD can efficiently detect various defects on the fabrics with pure-color, periodic pattern, and motif pattern at a high speed.

Conclusion

This study presents an improved fabric defect detection method based on SSD. This method uses a powerful CNN to extract defect features and introduces a channel attention mechanism FCSE block to enhance the weight of defective areas in each channel of the feature map. By adjusting the number of default boxes, the accuracy of the model in detecting long strip defects is significantly improved. Experimental results show that the proposed method can perform a high-speed detection and accurately detect various defects on fabric surfaces with different textures. Our future work will focus on defect segmentation at the pixel level.

Footnotes

Acknowledgments

The authors would like to thank the financial support received from the National Natural Science Foundation of China (No.61801121).

References

Hanbay

; Talu

M. F.

; Özgüven

Ö. F.

Optik 2016, 127 (24), 11960–11973.

Zhu

; Pan

; Gao

AUTEX Research Journal 2015, 15 (3), 226–232.

Mak

K.L

.; Peng

; Yiu

K. F. C.

Image and Vision Computing 2009, 27 (10), 1585–1592.

Chan

C. H

.; Pang

G. K.

IEEE Transactions on Industry Applications 2000, 36 (5), 1267–1276.

Jing

; Liu

; Li

; Zhang

The Journal of the Textile Institute 2016, 107 (10), 1305–1313.

Alata

; Ramananjarasoa

Pattern Recognition Letters 2005, 26 (8), 1069–1081.

Liu

; Chang

; Liang

Pattern Recognition and Artificial Intelligence 2018, 31 (2), 182–189.

Ngan

H. Y. T

.; Pang

G. K.H.

; Yung

N. H.C.

Image and Vision Computing 2011, 29 (7), 442–458.

Zhang

; Lu

; Li

Pattern Recognition Letters 2010, 31 (13), 2033–2042.

10.

Rebhi

; Benmhammed

; Abid

Journal of Photonics 2015, 2015, articleID 376163.

11.

Gao

; Zhou

; Wong

W. K.

; Gao

Woven Fabric Defect Detection Based on Convolutional Neural Network for Binary Classification. In

Proceedings, Artificial Intelligence on Fashion and Textiles, Hong Kong, HK, China, 2018, pp 307–313.

12.

Jing

J. F

.; Ma

; Zhang

Y. Y.

Coloration Technology 2019, 135 (3), 213–223.

13.

Ouyang

W. B

.; Xu

B. G.

; Hou

IEEE Access 2019, 7, 70130–70140.

14.

Xie

; Zhang

; Wu

IEEE Access 2019, 7, 182320–182334.

15.

H. G

.; Wang

; Huang

X. B.

Engineering Applications of Artificial Intelligence 2009, 22 (2), 224–235.

16.

Niu

; Lin

; Ke

IET Computer Vision 2017, 11 (2), 161–172.

17.

Lecun

; Bottou

; Bengio

Gradient-based learning applied to document recognition. In

Proceedings, IEEE, Leuven, Belgium, 1998, 86 (11), 2278–2324,

18.

; Chen

; Yong

; Jiang

IEEE Transactions on Image Processing 2019, 28 (12), 6077–6090.

19.

Dai

. Signal Processing: Image Communication 2019, 70, 79–88.

20.

Wei

; Zhu

; Qian

One-stage object detection networks for inspecting the surface defects of magnetic tiles. In

Proceedings, 2019 IEEE International Conference on Imaging Systems and Techniques (IST), Kradow, Poland, 2019, pp 1–6.

21.

; Sahoo

; Hoi

S. C.H.

Neurocomputing [Online], 2020.

22.

Ren

; He

; Girshick

IEEE Transactions on Pattern Analysis and Machine Intelligence 2017, 39 (6), 1137–1149.

23.

; Gkioxari

K. G.

; Dollår

Mask R-CNN. In Proceedings, IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp 2980–2988.

24.

Redmon

; Divvala

; Girshick

You only look once: Unified, realtime object detection. In

Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, N V, USA, 2016, pp 779–788.

25.

Redmon

; Farhadi

YOLO9000: Better, faster, stronger. In

Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 2017, pp 6517–6525.

26.

Redmon

; Farhadi

arXiv:1804.02767 [Online] 2018.

27.

Liu

; Anguelov

; Erhan

; Szegedy

SSD: Single shot multibox detector. In

Proceedings, European Conference on Computer Vision, Amsterdam, Netherlands, 2016, pp 21–37.

28.

Lin

T. Y

.; Goyal

; Girshick

Focal loss for dense object detection. In

Proceedings, IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp 2999–3007.

29.

Simonyan

; Zisserman

arXiv:1409.1556 [Online], 2014.

30.

Liu

; Liu

; Li

Fabric defect detection based on faster R-CNN. In

Proceedings, Ninth International Conference on Graphic and Image Processing, Qingdao, China, 2017, 10615, 1–9.

31.

Liu

; Guo

; Yang

Research on Texture Defect Detection Based on Faster-RCNN and Feature Fusion. In

Proceedings, 11th International Conference on Machine Learning and Computing, Da Lat, Vietnam, 2019, pp 429–433.

32.

Liu

; Liu

; Li

Fabric defects detection based on SSD. In

Proceedings, 2nd International Conference on Graphics and Signal Processing, Sydney, Australia, 2018, pp 74–78.

33.

Zhang

; Zhang

; Li

Yarn-dyed fabric defect detection with YOLOV2 based on deep convolution neural networks. In

Proceedings, IEEE 7th Data Driven Control and Learning Systems Conference, Hubei, China, 2018, pp 170–174.

34.

Jiao

; Zhang

; Liu

IEEE Access 2019, 7, 128837–128868.

35.

Jie

; Li

; Albanie

Squeeze-and-Excitation Networks. In

Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, Utah, USA, 2018, pp 7132–7141.

36.

TILDA dataset. https://lmb.informatik.uni-freiburg.de/resources/datasets/tilda.en.html (accessed October 2020).

37.

Xuelang dataset. https://tianchi.aliyun.com/competition/entrance/231666/introduction?spm=5176.12281957.1004.14.38b03eafqnVnpF (accessed October 2020).

38.

Chang

; Gu

; Liang

Mathematical Problems in Engineering 2018, 2018, articleID 3709821.

39.

Han

Y. -J

.; Yu

H. -J.

Applied Sciences 2020, 10 (2511), 1–10.

An Improved Fabric Defect Detection Method Based on SSD

Abstract

Keywords

Introduction

Overview of SSD

Basic Feature and Extra Feature Layers

Multi-Scale Feature Map and Default Box

MultiBox Loss

Improved Fabric Defect Detection Method Based on SSD

Network Architecture

Aspect Ratio

FCSE Block

Loss Function and Training

Experiment Design and Results

Data Creation

TILDA Dataset

Xuelang Manufacturing AI Challenge Contest Dataset

Performance Metrics

Analysis of TILDA Dataset

Analysis of Xuelang Dataset

Conclusion

Footnotes

Acknowledgments

References