A lightweight model for digital printing fabric defect detection based on YOLOX

Abstract

Online detection of digital printing defects is a necessary but challenging topic. The performance of the current detection methods is still not ideal for the diversified patterns of digital printing fabric defects and the realtime requirements of online detection. In this paper, we proposed a lightweight model of digital printing fabric defect detection based on YOLOX. Firstly, according to the characteristics of many types of defects and complex background of digitally printed fabrics, a defect detection network structure based on YOLOX is constructed. Then, the SE attention module is introduced to enhance important features and weaken unimportant features, which make the extracted features more directional. And it can further solve the influence of small feature size on the detection accuracy of small targets. The experimental results show that the proposed model has a detection accuracy of 66.2 mAP on our self-built dataset, which is 2.7 percentage points higher than YOLOX. This method can effectively solve the problem that low detection accuracy of small defects. The proposed model can meet the real-time requirements and improve the detection accuracy of small target defects.

Keywords

Defect detection YOLOX attention module computer vision

Introduction

Due to its environmental protection, technology, fashion and other characteristics, digital printing has become more and more widely used in the textile field. But during the printing process, there are many factors that may affect the final fabric quality, such as uneven sizing and coloring of the fabric, nozzle failure, mechanical failure, worker error and so on. There are eight common fabric defects as follows: PASS track, PASS tracks, ink leakage, blank, fluff, water leakage, ghost, and pulp,¹ as shown in Figure 1. In order to detect defects in time and reduce losses, it is very important to find a fast and accurate method for fabric defect detection. The existing fabric defect detection mostly relies on manual detection, which not only has high labor cost, but also has inefficiency and insufficient detection accuracy.² In recent years, with the rapid development of artificial intelligence, it has become a possibility to replace manual detection with network models of deep learning algorithms.

Figure 1.

Defect samples in the dataset: (a) PASS track, (b) PASS tracks, (c) ink leakage, (d) blank, (e) fluff, (f) water leakage, (g) ghost, and (h) pulp.

Target detection algorithms based on deep learning can be divided into “single-stage” and “two-stage"” target detection algorithms. Compared with the single-stage algorithm, the two-stage algorithm has higher detection accuracy. Based on this, many two-stage algorithms have been successfully applied in the field of industrial inspection in recent years.^3
–5 For example, Luo³ et al. adopted a decoupled two-stage object detection framework based on convolutional neural network (CNN) in the surface defect detection of flexible printed circuit boards and achieved good detection results. Although two-stage algorithms have high detection accuracy, their detection speed is difficult to meet real-time detection scenarios in practical industrial applications. The You Only Look Once (YOLO)^6
–9 series networks are single-stage models with higher real-time performance. With the continuous development of network and the continuous improvement of detection accuracy in recent years, they have also been widely used in the industrial field.^10
–15 For example, Xu¹⁰ et al. proposed a metal surface defect detection model based on the improved YOLOv3 model, which has made great progress compared with traditional manual detection in terms of accuracy and real-time performance. Kou¹² et al. proposed a YOLOv3-based strip steel surface defect detection model. By introducing a specially designed dense convolution to extract rich feature information, the detection effect of the detection model is better than other models.

Many deep learning models have also been successfully applied in the field of fabric defect detection in recent years.^16
–18 For example, Liu¹⁶ et al. proposed an improved Single Shot MultiBox Detector (SSD)-based model for fabric defect detection. Xie¹⁷ et al. proposed an improved fabric defect detection model based on the SSD algorithm by adding a fully convolutional squeezing and excitation (FCSE) module to improve the detection accuracy of the model. Adjust the number of default boxes to suit the detection of long defects on the fabric surface.

Although the above-mentioned fabric defect detection algorithms have improved the detection performance from different aspects, they all have the problem that the detection effect of small objects is not ideal, and because they all have anchor networks, the anchor frame will increase the computational cost during training. Therefore, this paper takes the single-stage anchor-free network YOLOX¹⁹ as the model basis, then adds the SE attention module, which enhances the important features and weakens the unimportant features, thereby making the extracted features more directional, so as to improve the detection accuracy of small objects. The results show that the improved YOLOX algorithm has achieved a great improvement in our own fabric defect dataset, and can effectively solve the problem of low detection accuracy of small objects.

YOLOX network model

YOLOX model

The previous YOLO series of networks all have anchor networks. Multiple anchor boxes are used in the prediction segment to predict the category and position of objects. YOLOX is changed to an anchor-free network, which greatly reduces the number of parameters of the detection head while ensuring performance improvement. YOLOX can be divided into YOLOX-x, YOLOX-Darknet53, YOLOX-l, YOLOX-m, YOLOX-s, YOLOX-Tiny and YOLOX-Nano according to the network size. The network structure of YOLOX can be divided into three parts: the backbone network, the neck network and the prediction head.

Backbone network

The backbone network of YOLOX adopts the YOLOv3-SPP structure. In the network structure of CNN, the final classification layer is composed of full connections. The number of fully connected features is fixed, so the size of the image is fixed when it is input to the network. However, images need to be cropped, stretched, or deformed that due to the variety of image sizes in reality, so leads to the distortion of the image and affects the final detection accuracy. The full name of SPP is Spatial Pyramid Pooling, which was proposed by He et al.,²⁰ that effectively solves the problem of image distortion. At the same time, the SPP module integrates local features and global features, which enriches the expressive ability of the feature map and improves the detection accuracy.

In addition, the backbone network of YOLOX also uses the Focus network structure, which is to get a value for every other pixel in a picture, so that four independent feature layers will be obtained, and then the four feature layers will be stacked, so that the width and height information is concentrated into the channel information, which quadruples the input channel. This is equivalent to a downsampling but no calculation is performed, which can reduce the computational cost. The specific method is shown in Figure 2.

Figure 2.

Focus structure:(a) is the 3-channel feature diagram of the input image and (b) is the 12-channel feature diagram with the width and height information concentrated on the channel.

In addition, the author also changed some training strategies, adding training techniques such as EMA weight update and Cosine learning rate mechanism, using the Intersection over Union (IoU) loss function to train the reg branch, and the BCE loss function to train the cls and obj branches.

Neck network

The neck network of YOLOX adopts the structure of FPN + PAN. FPN²¹ is top-down, which integrates high-level features and bottom-level features through upsampling, and conveys strong semantic features from top to bottom. PAN, on the other hand, fuses bottom-up features with down-sampling and high-level features, and conveys strong localization features from bottom-up, so that both the semantic features and location features of the entire convolutional layers are enhanced. The specific method is shown in Figure 3.

Figure 3.

FPN + PAN structure.

Head network

The prediction segment of YOLOX has been changed from the YOLO Head of the previous YOLO series to the Decoupled Head, which greatly improves the convergence speed. It consists of a 1 × 1 conv layer to reduce the number of channels, and then two parallel branches with 3 × 3 conv layers each. The specific structure is shown in Figure 4.

Figure 4.

Decoupled head structure.

The previous YOLO series of networks used Anchor-based, but this requires anchor points to be designed in advance, densely sampled images, and contains a lot of negative sample information, which requires a huge amount of computation. YOLOX adopts the Anchor-free method to reduce the number of predictions per position from 3 to 1. This method reduces the number of parameters in the prediction segment and improves performance and speed.

Loss function

The loss function of YOLOX includes three parts: iou_loss, obj_loss and cls_loss. Among them, iou_loss is used to calculate the detection effect of the predicted frame and the real frame, as shown in formulas (1) and (2), obj_loss is used to determine the target box is the object or the background, cls_loss is used to determine the category, and both use BCE With Logits Loss, as shown in formulas (3) and (4):

I O U = \frac{A \cap B}{A \cup B}

(1)

L o s s_{I O U} = 1 - I O U

(2)

L o s s_{B C E} = {l_{1}, \cdot \cdot \cdot, l_{N}}

(3)

\begin{matrix} l_{n} = - [y_{n} \cdot \log (σ (x_{n})) + (1 - y_{n}) \cdot \log (1 - σ (x_{n}))] \end{matrix}

(4)

A in formula (1) is the prediction frame, and B is the real frame. The input tensors in formulas (3) and (4) are (x, y), N is the batch size, and n labels are predicted for each batch, $σ$ representing the sigmoid function.

Proposed network model

SE module

The full name of the SE module is the Squeeze-and-excitation module, which mainly includes two parts: Squeeze and Excitation. The basic principle of SE is to process the convolved feature map, and then obtain a one-dimensional vector consistent with the number of channels, the number on this vector is used as the evaluation score of each corresponding channel, and then filter out the channel attention. The whole process is shown in Figure 5.

Figure 5.

Overall structure of SE module.

The first step is the compression operation, using a global average pooling, after which the feature maps are compressed into 1 × 1 × C vectors. The second step is the excitation operation, which uses two fully connected layers and an activation function to establish a connection between channels, and then normalized weights is obtained through the Sigmoid activation function. The last step is the Scale operation, which is weighted channel-by-channel to each channel of the original feature map through multiplication to complete the re-calibration of the original features by channel attention.

Improved YOLOX model

Since the size of the small target in the feature map is tiny, it is difficult for the features to be effectively trained, resulting in poor feature training for small targets and low detection accuracy. In this regard, this paper adds the SE module to enhance the important features and weaken the unimportant features by controlling the size of the scale, so as to make the extracted features more directional, thereby improving the small target detection effect. The network structure is based on YOLOX, and the improved network structure is shown in Figure 6. The overall improvement idea is as follows: add SE modules to the three effective feature layers input to the feature pyramid part, so that the feature pyramid of the Neck part is fused with the important features that have been calibrated and screened during upsampling, thereby improving the model’s ability to small target detection capability.

Figure 6.

Improved YOLOX network model.

Experiment

Datasets and data augmentation

The experimental data adopts the fabric defect images collected by ourselves, and uses Labelimg software for data calibration. There are eight types of defects in the dataset: PASS track, PASS tracks, ink leakage, blank, ghost, fluff, water leakage, pulp. There are10,320 images in total, 80% of which are used as training set, 10% as validation set, and 10% as test set.

Data augmentation can expand the dataset and improve the robustness of the model. This experiment uses the Mosaic data enhancement method. The basic principle is to crop four pictures at random, and then stitch the cropped pictures into one picture, which not only enriches the background of the picture, but also improves the batch size in disguised form. The specific operation is shown in Figure 7.

Figure 7.

Mosaic data augmentation.

Model training

The experiment is trained and tested on the Ubuntu operating system. The CPU uses Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50 GHz, the memory is 128 G, the graphics card uses NVIDIA RTX 2080Ti, and the deep learning framework is Pytorch. During the training process, the training batch size is set to 16, the number of images in the training set in the experiment is 8256, one training batch contains 16 images, and one training epoch contains 516 training batches, the experimental training cycle is set to 200 epochs, of which the first 50 training cycles are the freezing stage, that is, the model backbone is frozen, and the feature extraction network does not change, so the existing occupation is small, only fine-tuning the network can speed up the model training speed. The model is trained by stochastic gradient descent (SGD), the initial learning rate is set to 0.01, and the learning rate is adjusted by cosine annealing algorithm.

Evaluation indicators

In this paper, the Average Precision (AP), the mean Average Precision(mAP), the Frames Per Second (FPS), and the Parameter are used as evaluation indicators. The AP and mAP are used to evaluate the detection accuracy of the model, the FPS is used to evaluate the detection speed of the model, and the Parameter is used to evaluate the weight of the model.

In this paper, the precision rate and the recall rate are used to calculate the AP. The calculation formulas of the precision rate and the recall rate are formula (5) and formula (6), The calculation formulas of AP and mAP are formula (7) and formula (8):

\begin{matrix} P = \frac{T P}{\begin{matrix} T P + F P \end{matrix}} \times 100 % \end{matrix}

(5)

\begin{matrix} R = \frac{T P}{\begin{matrix} T P + F N \end{matrix}} \times 100 % \end{matrix}

(6)

A P = \int_{0}^{1} P (R) d R

(7)

m A P = \frac{\sum_{i = 1}^{k} A P_{i}}{k}

(8)

Where TP is the number of correctly predicted positive samples, FN is the number of incorrectly predicted negative samples, and FP is the number of incorrectly predicted positive samples, k is the total number of categories detected.

Experimental results and analysis

In general, pre-trained model will be downloaded from the Internet before network training, and the online pre-trained model will use different datasets for training, the commonly used public datasets are VOC dataset and MS COCO dataset.

In order to verify the influence of pre-trained model on the final network performance, we downloaded two kinds of pre-trained models based on different datasets. And then we trained them based on our algorithm, and selected two kinds of datasets to ensure the accuracy of experimental results. The final experimental results are shown in Table 1.

Table 1.

Performance comparison experiment of different pre-trained models.

Algorithm	Pre-trained model	Dataset	mAP	AP₅₀	AP₇₅	Param(M)	FPS
Ours	Coco	Ours	66.2	93.0	71.0	5.080	31
Ours	Voc	Ours	63.3	91.7	68.1	5.080	31
Ours	Coco	Voc	50.6	70.4	56.2	5.102	30
Ours	Voc	Voc	44.2	66.2	49.9	5.102	30

As can be seen from Table 1, for the same kind of network and the same kind of dataset, the model of pre-trained network based on COCO dataset ultimately has better network performance. Therefore, the pre-trained model based on COCO dataset is selected for all the networks.

The loss variation curve trained using the improved YOLOX network model is shown in Figure 8. According to the idea of transfer learning, the features of the backbone network are generic, so in order to speed up the training, a pre-trained model that has been trained on the COCO dataset is used. The training is divided into two stages, namely the freezing stage and the unfreezing stage. The freezing stage is set to speed up the training speed under the condition of insufficient machine performance. Unfreeze the backbone network after training for 50 epochs, and then train for 150 epochs, for a total of 200 epochs. As can be seen from Figure 9, the loss value decreased in the first 50 epochs, and fluctuated after unfreezing the backbone network, and then began to decrease rapidly. The training loss and validation loss gradually stabilized after 175 epochs, stabilized at 2.59 and 2.88 respectively, and the model reached a state of convergence.

Figure 8.

Loss change curve.

Figure 9.

Improved YOLOX model detection renderings.

In order to verify the performance of the improved model, we compared the commonly used lightweight object detection network model with our model, and the detection accuracy results are shown in Table 2. The comparison shows that our model has the best performance on mAP, AP₅₀, AP₇₅, AP_M, and AP_L. Compared with YOLOX-Tiny, mAP performance improved by 2.7 percentage points, AP₅₀ performance improved by 2.3 percentage points, AP₇₅ performance improved by 3.3 percentage points, AP_S performance improved by 1.3 percentage points, AP_M performance improved by 2.4 percentage points and AP_L performance improved 4.2 percentage points.

Table 2.

Results of the detection accuracy comparison.

Algorithm	mAP	AP₅₀	AP₇₅	AP_S	AP_M	APL	Param(M)	FPS
Efficientnet-YOLOv3	45.0	77.6	49.3	26.4	29.8	50.9	7.016	24
Ghostnet-YOLOv4	42.1	74.4	40.4	21.7	29.7	49.2	11.041	22
Mobilenetv2-YOLOv4	48.3	76.7	49.7	23.0	30.7	54.5	10.413	30
Mobilenetv3-YOLOv4	43.2	76.6	41.3	21.3	29.3	49.4	11.341	26
YOLOv4-Tiny	40.1	77.6	32.5	27.9	33.4	44.5	6.057	87
YOLOX-Tiny	63.5	90.7	67.7	44.6	48.5	70.6	5.056	35
SSD	59.2	84.7	64.2	28.4	42.5	68.7	26.121	10
Mobilenetv3-SSD	42.7	71.9	46.1	21.4	29.6	49.4	6.058	39
Ours	66.2	93.0	71.0	45.9	50.9	74.8	5.080	31

The detection speed results are shown in Table 2. The network model with the minimum of parameters is YOLOX-Tiny, our model is the second, because the SE module is added, but the amount of parameters added is very small, only 0.024 M. The model with the fastest detection speed is YOLOv4-Tiny, our model is the third fastest, but the detection accuracy of YOLOv4-Tiny is the worst, so in terms of overall model performance, our model performs the best. Compared with other models, since the improved algorithm in this paper increases the SE module, the calculation amount increases and the detection speed decreases slightly. However, when the speed reduction is not obvious, the detection accuracy of the improved algorithm in this paper is significantly higher. Therefore, the improved algorithm proposed in this paper can effectively improve the performance of the fabric defect detection. The final detection effect is shown in Figure 9.

In order to prove the robustness of our model, we also used VOC public dataset to conduct comparative tests, and the experimental results are shown in Table 3.

Table 3.

Comparative test based on VOC dataset.

Algorithm	mAP	AP₅₀	AP₇₅	AP_S	AP_M	APL	Param(M)	FPS
Efficientnet-YOLOv3	40.5	70.5	42.4	11.5	20.9	47.8	7.050	23
Ghostnet-YOLOv4	40.8	69.5	42.4	9.6	20.6	48.0	11.105	22
Mobilenetv2-YOLOv4	43.8	73.5	47.8	10.7	26.6	50.5	10.478	28
Mobilenetv3-YOLOv4	43.3	72.4	46.3	8.3	24.7	49.6	11.406	24
YOLOv4-Tiny	34.2	63.6	31.5	14.2	21.1	38.6	6.103	86
YOLOX-Tiny	49.5	69.2	56.0	19.5	31.4	55.5	5.073	33
SSD	52.3	77.5	59.0	3.9	31.3	60.9	26.151	10
Mobilenetv3-SSD	38.3	66.7	38.7	3.3	14.9	46.9	6.071	38
Ours	50.6	70.4	56.2	19.6	31.8	56.7	5.102	30

It can be seen from the Table 3 that in terms of detection accuracy, SSD has reached the optimum in mAP, AP₅₀, AP₇₅ and AP_L, with values of 52.3, 77.5, 59.0 and 60.9. Our model reached the optimal value in AP_S and AP_M, with values of 19.6 and 31.8. Because SSD has a large number of network parameters, so the detection performance will be stronger after using a large amount of data for training. However, due to the large number of parameters, more resources need to be calculated. Therefore, SSD performs poorly in terms of detection speed and FPS can only reach 10, which cannot meet the real-time requirements. The FPS of our model reaches 30, which meets the requirement of real-time performance. Therefore, our algorithm is still the best from the comprehensive performance.

Conclusion

This paper proposes a YOLOX defect detection algorithm that introduces SE attention module. The method is based on the YOLOX network structure. By introducing the SE attention module, the important features are enhanced and the unimportant features are weakened, so that the extracted features are more directional, which effectively alleviates the impact of small feature size on the detection accuracy of small targets, thereby the detection accuracy of the model for small targets is improved, and multiple performance indicators of the model are improved without increasing the amount of model parameters, and it meets the requirements of real-time detection. The experimental results show that the detection precision mAP value of this algorithm in our data set reaches 66.2, AP₅₀ reaches 93, and other four detection indexes also reach the optimal. In terms of detection speed, our algorithm reaches 31 FPS, which meets the real-time requirement. In summary, the comprehensive performance of our algorithm is optimal, which can realize the detection task of digital printing fabric defects.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Zebin Su

Huanhuan Zhang

References

Zhang

, et al CARL-YOLOF: A well-efficient model for digital printing fabric defect detection. J Eng Fibers Fabr 2022; 17: 15589250221135087.

Hanbay

Talu

Özgüven

ÖF.

Fabric defect detection systems and methods—A systematic literature review. Optik 2016; 127(24): 11960–11973.

Luo

Yang

, et al FPCB surface defect detection: A decoupled two-stage object detection framework. IEEE Trans Instrum Meas 2021; 70: 1–11.

Li Dongjie

李

Li Ruohao

李

. Mug defect detection method based on improved faster RCNN. Laser Optoelectron Prog 2020; 57: 353–360.

Sun

Qin

Xue

. Sheep delivery scene detection based on faster-RCNN. In: 2019 International Conference on Image and Video Processing, and Artificial Intelligence. SPIE, 2019, vol. 11321, pp.297–303.

Redmon

Divvala

Girshick

, et al You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp.779–788.

Redmon

Farhadi

YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp.7263–7271.

Redmon

Farhadi

Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.

Bochkovskiy

Wang

Liao

HYM

. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.

10.

Zhang

Wang

Metal surface defect detection using modified YOLO. Algorithms 2021; 14(9): 257.

11.

Tan

Cai

, et al Automatic detection of sewer defects based on improved you only look once algorithm. Autom Constr 2021; 131: 103912.

12.

Kou

Liu

Cheng

, et al Development of a YOLO-V3-based model for detecting defects on steel strip surface. Measurement 2021; 182: 109454.

13.

Zhong

Deng

. Real-Time Detection based on Modified YOLO for Herniated Intervertebral Discs. In: Proceedings of the 2019 4th International Conference on Intelligent Information Processing, 2019, pp.466–470.

14.

Zheng

Liu

, et al Sleeper defect detection based on improved YOLO V3 algorithm. In: 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA). IEEE, 2020, pp.955–960.

15.

Gai

Chen

Yuan

A detection algorithm for cherry fruits based on the improved YOLO-v4 model. Neural Comput Appl 2023; 35: 13895–13906.

16.

Liu

, et al Fabric defects detection based on SSD. In: Proceedings of the 2nd International Conference on Graphics and Signal Processing, 2018, pp.74–78.

17.

Xie

Zhang

An improved fabric defect detection method based on ssd. AATCC J Res 2021; 8(1_suppl): 181–190.

18.

Zhang

Qiao

, et al Attention-based feature fusion generative adversarial network for yarn-dyed fabric defect detection. Text Res J 2023; 93: 1178–1195.

19.

Liu

Wang

, et al Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021.

20.

Zhang

Ren

, et al Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 2015; 37(9): 1904–1916.

21.

Lin

Dollár

Girshick

, et al Feature pyramid networks for object detection. In: Procee-dings of the IEEE conference on computer vision and pattern recognition, 2017, pp.2117–2125.