Sage Journals: Discover world-class research

Abstract

In order to improve the automatic production efficiency of the textile workshop and reduce the labor cost and error rate, this study designed a multi label recognition model YoloColor-Net based on deep learning, which aims to realize the automatic detection and recognition of the bobbin shape and yarn color on the yarn frame in the textile workshop. Firstly, according to the research needs, a dataset sample containing 12,173 textile bobbin images was collected and constructed independently. Then, the traditional yolov5 model is improved by designing the convolution network of yarn color recognition, which solves the problem of missing detection of the bobbin when detecting the bobbin shape and yarn color at the same time. Secondly, the lightweight DSConv module is used to replace the ordinary convolution of the Backbone layer to reduce the parameters of YoloColor-Net and improve the running speed. Finally, the improved attention mechanism (ICBAM) is added to the Backbone and Neck layers of YoloColor-Net to improve the accuracy of bobbin recognition. The experimental results show that the detection accuracy of the improved YoloColor-Net model is 99.3%, the number of model parameters is reduced by 10.4%, and the GFLOPS is reduced by 17.1%. FPS increased from 43.13 to 67.23, an increase of 55.9%. Therefore, the proposed model can basically meet the task of bobbin automatic detection and recognition, and consider the initial localization deployment.

Keywords

Bobbin shape recognition yarn color recognition yolov5 model convolutional neural network improve attention mechanism

Introduction

In the textile industry, due to the continuous rise of textile raw materials and labor costs, the profit space of textile enterprises is shrinking. With the rapid development of machine vision technology, deep learning has been widely used in various industries and has made outstanding contributions to the cost reduction and efficiency increase of the industry. Therefore, the textile industry is also constantly committed to using machine vision technology to replace labor, improve the efficiency of automated production in textile workshops, and realize lean production.^1,2

At present, the shapes of commonly used bobbins in textile workshops can be divided into cylindrical and conical bobbins, as shown in Figure 1(a). Commonly used textile yarns have a variety of colors, as shown in Figure 1(b). In the production process, the remaining yarn of each color on the yarn frame is different. Manual replacement of the remaining insufficient yarn on the yarn frame is inefficient and easy to cause wrong yarn color replacement. In order to avoid this situation, a high-precision and high-efficiency bobbin detection model is developed to detect the bobbin shape and yarn color in real time and accurately, which is of great significance to improve production efficiency and reduce production costs in the textile industry.

Figure 1.

Common bobbins in factories: (a) bobbin shape; (b) yarn color.

The main research object of this paper is the shape of the bobbin and the color of the yarn in the textile workshop, aiming to realize the automatic detection and recognition of the shape of the bobbin and the color of the yarn. In the process of studying the field of bobbin recognition in textile workshop, it is found that although the existing target detection algorithms have their own advantages, they all have different degrees of limitations. Karimanzira et al.³ constructed a general target detection system through the pre trained convolutional neural network RCNN, and the detection accuracy can reach 98.392%, but the image processing speed is slow, which is not suitable for high-efficiency production scenes; Lei et al.⁴ improved the PANet to achieve multi-scale feature fusion, and optimized the confidence loss function to enhance the detection ability of the network for the target in the fuzzy underwater image, but the model parameters are large and the detection speed is slow. Zhang et al.⁵ improved the recall rate and average detection accuracy of target detection by improving the backbone network of YoloV3, but the improved model has large size and low training efficiency, which is not suitable for subsequent deployment in the factory production environment with limited resources and environment; Tang et al.⁶ improved the convolutional neural network vgg-16 model, weakened the joint adaptability between the neuron nodes of the model, improved the generalization ability of the model, and achieved an image recognition accuracy of 90.58%, but the model needed a lot of experiments and debugging to find the optimal super parameters; Su et al.⁷ refined the feature transmission of the model during feature fusion by adding spatial and channel attention mechanisms to the target detection model to reduce the interference of complex scenes. But it increases the time cost of training and reasoning, and may lead to training over fitting.

For the research of target color recognition, Yue et al.⁸ proposed a method to directly process multi-spectra using a diffusion model (Dif-Fusion), generating images with high color fidelity through a multi-channel fusion module, significantly improving the accuracy of color detection. Rabie et al.⁹ proposed a detection technology based on color histogram contour (CHC), which achieved robustness to light changes by constructing high-precision chroma feature vectors and emphasizing unique color features. Wang et al.¹⁰ designed a deep convolutional network framework for small infrared target detection, which eliminates false alarms caused by random noise and clutter through the similarity of motion and radiation between the detection results, and integrates the target optical flow information into the trajectory to realize the spatial domain detection of infrared targets. Qian et al.¹¹ proposed a color segmentation algorithm combining self-organizing map neural network and efficient dense subspace clustering, which achieved high-precision color segmentation, and the accuracy of color recognition on complex printed fabrics reached 88.3%. Ismael AM et al.¹² proposed the solution of wavelet transform (WT) and adaptive neuro fuzzy inference system (ANFIS) in color texture classification, and achieved a classification success rate of more than 96% in different color spaces, which verified the effectiveness of the method for color texture classification. The above convolutional neural network (CNN) has a certain effect in the research of target detection and color recognition, but it can not achieve the task of multi label recognition.

At present, in the application research of textile factory automation, multi label recognition technology can detect the shape of the bobbin and the color of the yarn at the same time, which can reduce the dependence on labor, improve the detection efficiency, and avoid errors caused by human factors. In order to realize the rapid and accurate detection of the bobbin shape and yarn color on the yarn frame, we use yolov5¹³ model and the original convolutional neural network for yarn color recognition to ensure the high accuracy of the model in the task of multi label recognition. Because the model needs to be deployed to embedded devices in the future, the lightweight of the model is also crucial. We named this model for the detection of bobbin shape and yarn color YoloColor-Net. The contributions of this study can be summarized as follows:

(1) Based on the official yolov5 model, the convolutional neural network specially used for yarn color recognition is fused behind the Head layer to form a preliminary YoloColor-Net to complete the multi label recognition task of detecting the shape of the bobbin and the color of the yarn.

(2) Improve the Backbone layer of YoloColor-Net: Replace the ordinary convolution with distribution shift convolution (DSConv).¹⁴ DSConv uses quantization and distribution shift to simulate the behavior of the convolution layer, and achieves the purpose of increasing the model speed and reducing the number of parameters by storing only integer values during the operation process.

(3) Improve convolution block attention module: Improve convolution block attention module (CBAM)^15,16 and propose ICBAM attention mechanism. Then an improved attention mechanism ICBAM is introduced in the early stage of YoloColor-Net feature extraction and feature fusion to improve the detection accuracy of the model.

The work of this paper is arranged as follows: in Proposed model, an improved YoloColor-Net model framework for bobbin recognition is proposed; In Experiments and discussion, an experimental platform is built to verify the algorithm framework; In Conclusions, summarizes this article.

Proposed model

The overall framework of YoloColor-Net model, as shown in Figure 2, is divided into a bobbin shape detection module and a yarn color recognition module, which is composed of four parts: (1) The Backbone layer mainly extracts the features of the input image; (2) The Neck layer is responsible for feature fusion of feature maps with different scales and transferring these features to the Head layer; (3) The Head layer receives the fused feature map information and performs regression prediction; (4) The Output layer extracts the color information according to the feature map information, and outputs the final result. The DSConv convolution module is used to replace the ordinary convolution in the Backbone layer of YoloColor-Net to improve the model speed and reduce the amount of parameters. The improved ICBAM attention mechanism is inserted into the Backbone layer and Neck layer of YoloColor-Net, and the channel attention module is optimized by using one-dimensional convolution instead of multi-layer perceptron structure in the channel attention mechanism module to reduce the loss of feature information; The multi-scale hole convolution path is introduced into the spatial attention module to enhance the comprehensiveness of feature extraction

Figure 2.

YoloColor-Net model.

Yarn color recognition module

In the process of research, it was found that when yolov5 model was used to detect the shape of the bobbin and the color of the yarn at the same time, the bobbin would be missed.¹⁷ The reason is that the yarn color is more subtle and complex than the bobbin shape, and some layers in the model may pay too much attention to the shape features and ignore the color features. To solve this problem, we added a special color recognition branch in the detection head of yolov5 to form the YoloColor-Net model.^18,19 The yarn color recognition network structure is shown in Table 1.

Table 1.

Color detection convolutional network structure.

Type	Kernel	Stride	Input size	Output size
Conv	2 × 1 × 1	1	3 × 640 × 640	2 × 640 × 640
ReLU			2 × 640 × 640	2 × 640 × 640
MaxPool	2 × 2	2	2 × 640 × 640	2 × 320 × 320
Conv	3 × 3 × 3	1	2 × 320 × 320	3 × 318 × 318
ReLU			3 × 318 × 318	3 × 318 × 318
AvgPool	2 × 2	2	3 × 318 × 318	3 × 159 × 159
FC			3 × 159 × 159	1000
Normalization			1000	1000
ReLU			1000	1000
FC			1000	8
Softmax			8	8

This branch can accept features from FPN as input, and then output the color recognition of each target bounding box. Because the convolution kernel of 1 × 1 focuses on the correlation between pixels in the same position of different channels, rather than the correlation of pixels in the same channel. Therefore, 1 × 1 convolution kernel is not commonly used 3 × 3 or 5 × 5 for the first layer of the network. In this experiment, 1 × 1 convolution means paying more attention to color channel information, integrating cross channel information, and reducing learning parameters through dimensionality reduction.²⁰ The convolution layer is transformed nonlinearly by using the ReLU activation function and the maximum pooling layer, and the size of the feature map is reduced by down sampling. Then the convolution of 3 × 3 × 3 is used to upgrade the dimension, and the first full connection layer is used for linear transformation to generate 1000 maps, and the normalized output is used.^21,22 Finally, the activation function ReLU is used to accelerate the training again, and the second full connection layer is used for the final output. Softmax is used as the color classifier at the end of the network.²³

Bobbin shape recognition module lightweight

In the research of YoloColor-Net, DSConv convolution module is used to replace the ordinary convolution of Backbone layer. When YoloColor-Net model is used to train bobbin samples, DSConv convolution decomposes the convolution kernel into two components: variable quantization kernel (VQK) and distribution offset, and applies kernel based and channel based distribution offset to maintain the same output as the original convolution. By storing only integer values to achieve lower memory usage, the purpose of improving model speed and reducing the amount of parameters is achieved. The model structure of DSConv is shown in Figure 3.

Figure 3.

DSConv model structure.

The original convolution tensor is $(C_{o}, C_{i}, k_{h}, k_{w})$ , $C_{o}$ represents the number of channels in the next layer, $C_{i}$ represents the current number of channels, $k_{h}$ represents the height of the convolution kernel, and $k_{w}$ represents the width of the convolution kernel. In the training process, VQK is composed of low values, which is the same as the original convolution tensor. The overall goal of the distribution offset component is to simulate the behavior of the convolution layer by using quantization and distribution offset, which is divided into two parts: kernel distribution offset (KDS) and channel distribution offset (CDS). KDS performs distribution migration on each slice $(1, B, 1, 1)$ of VQK, where B is a block size hyperparameter and its tensor size is $(C_{o}, | \frac{C_{i}}{B} |, k_{h}, k_{w})$ , where $| X |$ is the ceiling operation. CDS is a tensor of a single precision number with the size of $2 \cdot (C_{o})$ , which is distributed and shifted in each $(1, C_{i}, k_{h}, k_{w})$ channel slice.

DSConv in the quantization process, the quantization function takes the number of bits to be quantized in the network as the input, and uses the complement of two to save the integer value. For the number with bit length B, there is the following relationship:

w_{q} \in Z, b \in N | - 2^{b - 1} \leq w_{q} \leq 2^{b - 1} - 1

(1)

where

w_{q}

is the value of each parameter in the tensor.

Then, the weight of each convolution layer is scaled so that the maximum absolute value of the original weight $w$ matches the maximum value of the above quantization constraint. The new weights $w_{q}$ are stored in memory as integer values for subsequent use in training and reasoning. By replacing the ordinary convolution with DSConv, the memory saved by each tensor weight is:

p = \frac{b}{32} + \frac{| c_{i} |}{c_{i}}

(2)

By moving the VQK value through KDS and CDS, the weight of each block of the pre training network will be stretched or rounded to fit the interval in equation (1) and stored in VQK. The optimal value of KDS is:

ξ = \min_{\hat{ξ}} {\sum_{i = 0}^{B - 1} (w_{q i} \hat{ξ} - w_{i})}^{2}

(3)

Its closing form is:

ξ = \frac{\sum_{i = 0}^{B - 1} w_{i} w_{q i}}{\sum_{i = 0}^{B - 1} w_{q i}^{2}}, \forall (1, B, 1, 1) s l i c e s

(4)

where

ξ

is the KDS value for that block.

In summary, we choose to replace the ordinary convolution of the YoloColor-Net Backbone network with DSConv convolution to speed up model training and reduce model storage space.

Add attention mechanism ICBAM

In the neural network, the output of each neuron only depends on the output of all neurons in the previous layer. In the attention mechanism, the output of each neuron not only depends on the output of all neurons in the previous layer, but also can be weighted according to different parts of the input data and given different weights. Therefore, adding attention mechanism can make the network pay more attention to the key information of the input image, so as to improve the accuracy and efficiency of the model.

Considering the background interference when detecting the bobbin, the attention mechanism CBAM can suppress the complex background interference and extract the key pixel area.¹⁶ Research and analyze the shortcomings of CBAM attention mechanism. By optimizing the channel attention module and spatial attention module, the improved ICBAM attention mechanism is inserted into the Backbone and Neck of YoloColor-Net, which not only improves the ability of the model to suppress complex background interference when detecting the bobbin, but also effectively improves the detection accuracy of the model. The specific improvement research is as follows:

Improvement of channel attention module

The improvement process of the channel attention module of ICBAM is as follows: given the input feature map $F \in R^{C \times H \times W}$ , first perform global average pooling and global maximum pooling in the spatial dimension respectively. Global average pooling highlights the global feature information of the image, and maximum pooling highlights the salient feature information of the image.²⁴ The results of the two poolings are added and fused in the channel dimension to obtain $F_{2}$ to obtain more comprehensive feature information. Then, a one-dimensional convolution with a convolution kernel length $k$ is used to map the feature information between $k$ adjacent channels in $F_{2}$ . Finally, the value is normalized to the range of 0∼1 through the Sigmoid activation function to generate the channel attention weight $M_{C}$ . The improved model is shown in Figure 4, and its formula is:

M_{C} (F) = σ (f_{1 D}^{k} (A v g P o o l (F) + M a x P o o l (F)))

(5)

where,

f_{1 D}^{k}

represents a one-dimensional convolution operation with a convolution kernel size of

k

. The size of one-dimensional convolution kernel

k

can be adaptively determined according to the number of channels of the input feature map. The formula is:

k = {| \frac{1 b C}{2} + \frac{1}{2} |}_{o d d}

(6)

where

{| x |}_{o d d}

represents the odd number closest to

x

upward.

Figure 4.

ICBAM channel attention module structure.

Improvement of spatial attention

The improvement process of spatial attention module is as follows: given the input feature map $F \in R^{C \times H \times W}$ , firstly, the global average pooling and global maximum pooling are respectively carried out in the channel dimension. The obtained two feature maps are spliced and fused in the channel dimension to obtain $F_{3}$ . Then $F_{3}$ is input into the multi-scale convolution path to extract the characteristic information of different receptive fields.²⁵ The structure includes a 1 × 1 convolution layer and three 3 × 3 cavity convolution layers, and the expansion factors are (1, 2, 3) respectively. Then, the outputs of multi-scale convolution channels are spliced and fused, and 1 × 1 convolution layer is used to reduce the channel dimension. Finally, the Sigmoid activation function normalizes the value to the 0∼1 interval to generate the spatial attention weight $M_{S}$ .²⁶ The improved model is shown in Figure 5, and its process formula is:

Λ (\cdot) = [C o n v_{(1, 1)} (\cdot), C o n v_{(3, 1)} (\cdot), C o n v_{(3, 2)} (\cdot), C o n v_{(3, 3)} (\cdot)]

(7)

M_{S} (F) = σ (f^{1 \times 1} (c o n c a t (Λ [A v g P o o l (F); M a x P o o l (F)])))

(8)

where

Λ

represents the multi-scale convolution path;

C o n v_{(i, j)}

represents the convolution operation using a dilated convolution kernel with an original kernel size of

i

and a dilation factor of

j

;

f^{1 \times 1}

represents the convolution operation with a kernel size of 1 × 1.

Figure 5.

ICBAM spatial attention module structure.

Experiments and discussion

In this section, the performance of the proposed model is evaluated through comparison and ablation experiments. The experimental platform parameters are shown in Table 2. During the experiment, the SGD optimizer is used to optimize the model, the initial learning rate is 0.01, the momentum is 0.953, the momentum decay is 5 × 10^(-5), the batch size is 32, and the activation function is ReLU. After activation, batch normalization is used to accelerate training and avoid gradient disappearance and gradient explosion. We use the early stopping technique. If the AP value on the validation set no longer increases after 200 epochs, the iteration is stopped in advance. This operation can prevent the model from overfitting and obtain good generalization performance.²⁷

Table 2.

Experimental platform parameter settings.

Parameter	Value
Operating system	Windows11
Memory capacity	32 GB
GPU	NVIDIA GeForce RTX 4060Ti
CPU	Intel(R) core i7 14650HX
Model framework	Pytorch 1.7.1
Programming language	Python 3.8

Construction of experimental samples

The experimental sample collection site is the Textile Intelligent Manufacturing Laboratory of Zhejiang Sci-Tech University. The collection equipment is a Hikrobot MV-CA060-10 GC color camera with a camera resolution of 3072 × 2048. The distance between the camera end face and the creel end face is 695 mm, the camera focal length is 6 mm, the pixel size is 2.4um/pix, and a strip light source is used as the working light source during image collection.^28–30 The sample collection scene is shown in Figure 6.

Figure 6.

Sample collection site: (a) Strip light source; (b) Camera installation location.

The experimental samples covered eight types of different bobbins most commonly used in the factory, and a total of 12,173 bobbin pictures were collected. 1000 pictures were selected for each type of bobbin, and a total of 8000 pictures were selected as the experimental data set. The detailed data set distribution is shown in Table 3. After the experimental data set is labeled, it is randomly scrambled. 70% is used as the training set, and 30% is used as the test set to evaluate the performance of the model.

Table 3.

Dataset details.

Color	Color discrimination	Type	Number
White	T800	Cone	1000
White	T800	Cylinder	1000
Black	J1C1738	Cylinder	1000
Blue	230,497	Cone	1000
Plum red	191,863	Cylinder	1000
Yellow brown	8105	Cone	1000
Orange red	151,263	Cone	1000
Red	54NM2	Cone	1000

Color recognition module experiment

When processing the multi label recognition task, adding the color recognition module to the convolutional neural network (CNN) can extract the features of the bobbin shape and yarn color respectively, avoid the difficulty of feature fusion due to the different representation of the bobbin shape and yarn color in the image, and make the details of the bobbin feature change prominent, and the contour of the feature map change clearly, as shown in Figure 7.

Figure 7.

Comparison of feature maps before and after adding the color recognition module.

In the experimental process, we can set loss functions for shape and color features respectively to ensure that the model pays attention to these two features at the same time in the optimization process, so as to improve the detection accuracy. The contrast effect before and after adding color recognition convolution layer is shown in Figure 8.

Figure 8.

Experimental diagram of bobbin detection (a) bobbin missed detection before adding color recognition module (b) full detection of bobbin after adding color recognition module.

Detection accuracy

In order to illustrate the advantages of the improved model, the performance of the improved model is quantitatively analyzed on the test set of the bobbin data set with accuracy, recall and average accuracy. Before calculating these indicators, we first calculate true positive (TP), false negative (FN), false positive (FP), true negative (TN), and their specific meanings are shown in Figure 9:

Figure 9.

Definition of TP, FN, FP and TN.

Precision refers to the proportion of the number of positive samples correctly predicted by the model to the number of all positive samples predicted by the model. Its formula is:

p r e c i s i o n = \frac{T P}{T P + F P}

(9)

Recall refers to the proportion of the number of positive samples correctly predicted by the model to the number of true positive samples. Its formula is:

r e c a l l = \frac{T P}{T P + F N}

(10)

AP (average precision) measures the average accuracy of the model in a single category, which reflects the performance of the model in predicting positive samples. When multiple categories of objectives are involved, the average AP (mAP) is used as the evaluation index of the whole model, and its formula is:

m A P = \frac{\sum_{i = 1}^{n} A P_{i}}{n}

(11)

where

A P_{i}

is the AP value of type

i

, and

n

is the number of categories in the dataset.

The accuracy of bobbin recognition of different improved modules is studied through ablation experiments, and the results are shown in Table 4. It can be seen from the table that when using DSConv module, the accuracy rate will be slightly improved by 0.5% compared with the original model, the accuracy rate of using the improved ICBAM attention mechanism alone will be 1.4% higher than that of CBAM attention mechanism, and the accuracy rate of using DSConv module and ICBAM attention mechanism will be 3.6% higher than that of the original model.

Table 4.

Comparison of detection accuracy of YoloColor-Net.

YoloColor-Net	DSConv	CBAM	ICBAM	Precision	Recall	mAP@0.5
√				0.949	0.921	0.957
√	√			0.954	0.928	0.962
√		√		0.962	0.963	0.964
√			√	0.976	0.970	0.978
√	√	√		0.981	0.983	0.989
√	√			0.991	0.989	0.993

In order to verify the effect of the improved ICBAM attention mechanism, this paper conducted comparative experiments on different attention mechanisms under the same experimental conditions, and the results are shown in Table 5. It can be seen from the table that the accuracy of the model using ICBAM attention mechanism is up to 99.3%, and the accuracy of the model using the unmodified CBAM attention mechanism is also higher than that of CA and SENet attention mechanism, indicating that the CBAM attention mechanism change is suitable for performing the multi label recognition task of the bobbin, so the CBAM attention mechanism is used and improved in the study.

Table 5.

mAP values of YoloColor-Net with different attention mechanisms.

Types of attention mechanisms	mAP@0.5
YoloColor-Net + DSConv + SENet	0.981
YoloColor-Net + DSConv + CA	0.983
YoloColor-Net + DSConv + Shuffle-attention	0.987
YoloColor-Net + DSConv + CBAM	0.989
YoloColor-Net + DSConv + ICBAM	0.993

As shown in Figure 10, it is the distribution diagram of the heat map of the bobbin. The redder the red area in the heat map, the greater the weight information of the model there. ICBAM attention mechanism can more completely locate the weight information of the model to the end face of the bobbin, followed by CBAM attention mechanism. And it can be seen from the figure that the Shuffle-Attention, SENet and CA attention mechanisms added to the model can not be well positioned to the end face of the bobbin, which is prone to false and missed detection of the model, resulting in the model detection accuracy of the bobbin is not high enough. Therefore, when ICABM attention mechanism is added to the model, the detection accuracy of the model is higher.

Figure 10.

Heat map distribution of YoloColor-Net with different attention mechanisms added.

Computational cost analysis

In the follow-up study, the YoloColor-Net model we proposed needs to be deployed in the textile workshop with limited computing resources.³¹ The computational cost of the model is directly related to whether it can be deployed and run in the automated production process. Therefore, this section analyzes the computational cost of the model to provide guidance for further optimization of the model.

Table 6 shows the detailed data comparison of parameters, floating-point operation and detection speed of YoloColor-Net models in different improvement stages. By using DSConv lightweight convolution block and adding ICBAM attention mechanism, the Parameters is reduced from 7225885 to 6472341, which is reduced by 10.4%. The number of running GFLOPS decreased from 16.4 to 13.6, a decrease of 17.1%. In terms of detection speed, FPS is 67.23, which is 55.9% higher than the initial model. It meets the real-time detection requirements of target detection speed greater than 30FPS in industrial applications.

Table 6.

Comparison of YoloColor-Ne detection performance indicators.

Improvement phase	Parameters	GFLOPS	FPS
YoloColor-Net	7225885	16.4	43.13
YoloColor-Net + DSConv	6710830	14.9	65.71
YoloColor-Net + CBAM	7314672	15.3	42.46
YoloColor-Net + ICBAM	6923354	16.8	62.18
YoloColor-Net + DSConv + CBAM	7031701	15.8	52.85
YoloColor-Net + DSConv + ICBAM	6472341	13.6	67.23

As shown in Figure 11, it is the size of computer memory occupied by the model. The memory of YoloColor-Net model before improvement is 27.56Mb, and the memory occupied by the improved model is 24.69Mb. By comparing the performance of Parameters, GFLOPS and FPS of the model in different improvement stages, the impact of different improvement schemes on the calculation cost of the model is intuitively displayed, and it is further demonstrated that the model proposed in this paper has reached the current optimal state in terms of balance performance and calculation efficiency. Although the current algorithm model has achieved some results in detection performance, its large number of parameters and high memory consumption are still the main problems restricting the detection speed of the model.^32,33

Figure 11.

Comparison of memory usage at different improvement stages.

Conclusions

In this paper, we propose a deep learning framework YoloColor-Net, which can realize multi label recognition. It aims to detect and recognize the bobbin shape and yarn color in the textile workshop at the same time.

By adding a yarn color recognition module to the detection head of Yolov5, it can avoid missing detection when Yolov5 model detects the shape of the bobbin and the color of the yarn at the same time, and form the initial YoloColor-Net model. In order to reduce the amount of model parameters, the distributed shift convolution (DSConv) is used to replace the ordinary convolution to improve the Backbone layer of YoloColor-Net. In addition, the convolution block attention module (CBAM) is improved, and the ICBAM attention mechanism is proposed, which is added to the Backbone and Neck layers of YoloColor-Net to realize the lightweight of the model and improve the detection accuracy, effectively improving the shortcomings of CBAM attention mechanism

In the multi label recognition experiment, YoloColor-Net model can complete the task of simultaneous detection of bobbin shape and yarn color, with Precision of 99.1%, Recall of 98.9%, mAP@0.5 of 99.3%. By improving the Backbone layer and adding ICBAM attention mechanism, the Parameters of the model are reduced by 10.4%, GFLOPS is reduced by 17.1%, and FPS is increased by 55.9%, which has a good industrial application prospect. However, the current large number of parameters and memory occupation of the model are the main factors affecting the further improvement of the model detection speed.

The lightweight model can not only reduce the demand for hardware resources, but also reduce production costs. And it can respond to production demand faster, reduce the production stagnation caused by calculation delay, and improve the overall production efficiency. Therefore, while ensuring the accuracy of the model, optimizing the model compression technology to realize the lightweight of the model is the direction of further research.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Jindou Zhang

Yang li

References

Ruan

Lvhang

Zuoxiang

, et al. The impact of rising costs on China’s labor-intensive industries: a study based on Ningbo's textile and garment industry cluster. J Zhejiang Univ - Sci 2021; 51(6): 119–133.

Zhang

Shan

Ying

. Rapid positioning method of sarong yarn rod based on machine vision. Textile Journal 2020; 41(12): 137–143.

Karimanzira

Renkewitz

Shea

, et al. Object detection in sonar images. Electronics 2020; 9: 1180.

Lei

Tang

. Underwater target detection algorithm based on improved YOLOv5. J Mar Sci Eng 2022; 10(3): 310.

Zhang

Wei

. An improved small target detection method based on Yolov3. 2021 international conference on electronics. Circuits and information engineering (ECIE), Zhengzhou, China, 22–24 Jan 2021, pp. 220–223.

Tang

Jin

Xiao

, et al. Recognition of side-scan sonar shipwreck image using convolutional neural network. In: 2020 2nd international conference on machine learning, big data and business intelligence, Taiyuan, China, 23–25 Oct 2020, pp. 529–533.

Yan

, et al. SII-Net: spatial information integration network for small target detection in SAR images. Rem Sens 2022; 14(3): 442.

Yue

Fang

Xia

, et al. Dif-fusion: toward high color fidelity in infrared and visible image fusion with diffusion models. IEEE Trans Image Process 2023; 32: 5705–5720.

Rabie

Baziyad

Sani

, et al. Color histogram contouring: a new training-less approach to object detection. Electronics 2024; 13(13): 2522.

10.

Wang

Zhou

, et al. Low-altitude infrared small target detection based on fully convolutional regression network and graph matching. Infrared Phys Technol 2021; 115: 103738.

11.

Qian

Wang

Huang

, et al. Color segmentation of multicolor porous printed fabrics by conjugating SOM and EDSC clustering algorithms. Textil Res J 2022; 92(19-20): 3488–3499.

12.

Ismael

Şengür

. Deep learning approaches for COVID-19 detection based on chest X-ray images. Expert Syst Appl 2021; 164: 114054.

13.

Redmon

Divvala

Girshick

, et al. You only look once: unified, real-time object detection[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, 27–30 June 2016, IEEE, pp. 779–788.

14.

Nascimento

MGD

Prisacariu

Fawcett

. DSConv: efficient convolution operator. New York City, NY: IEEE, 2020.

15.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017. DOI: 10.48550/arXiv.1706.03762.

16.

Woo

Park

Lee

, et al. CBAM: convolutional block attention module. In: European conference on computer vision, Munich, Germany, 8–14 September 2018, pp. 3–19.

17.

Abdel-aziz

Karara

. Direct linear transformation from comparator to object space coordinates in close-range photogrammetry[C]//ASP symp. on close-range photogrammetry. Urbaba, Illinois 1971; 1: 18.

18.

Han

Sun

, et al. The method of creel positioning based on monocular vision. Sensors 2022; 22(17): 6657.

19.

Mukundan

Raghu

. A vision based attitude and position estimation algorithm for rendezvous and dockin. J Spacecraft Technol 1994; 4(2): 60.

20.

Shi

Wang

. The detection of thread roll's margin based on computer vision. Sensors 2021; 21(19): 6331.

21.

Liu

Wang

, et al. Semi-parametric decolorization with laplacian-based perceptual quality metric. IEEE Trans Circ Syst Video Technol 2016; 27(9): 1–1868.

22.

Brosed

Aguilar

Santolaria

, et al. Geometrical verification based on a laser triangulation system in industrial environment. Effect of the image noise in the measurement results. Procedia Eng 2015; 132: 764–771.

23.

Run

Sun

Liu

, et al. Sobel edge detection based on weighted nuclear norm minimization image denoising. Electronics 2021; 10(6): 655.

24.

Hinton

Osindero

Teh

. A fast learning algorithm for deep belief nets. Neural Comput 2006; 18(7): 1527–1554.

25.

Girshick

Donahue

Darrell

, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, OH, 23–28 June 2014: IEEE: 580–587.

26.

Uijlings

JRR

Sande

KEA

Gevers

, et al. Selective search for object recognition. Int J Comput Vis 2013; 104(2): 1–7.

27.

Huang

Wang

, et al. Unsupervised fabric defect detection based on a deep convolutional generative adversarial network. Textil Res J 2020; 90: 247–270.

28.

Geng

. Structured-light 3D surface imaging: a tutorial. Adv Opt Photon 2011; 3(2): 128–160.

29.

Daniel

Richard

. High-accuracy stereo depth maps using structured light. In: 2003 IEEE computer society conference on computer vision and pattern recognition. Madison, Wisconsin, 18 –20 June 2003, IEEE, pp. 195–202.

30.

Jalkio

Kim

Case

. Three dimensional inspection using multistripe structured light. Opt Eng 1985; 24(6): 966–974.

31.

Han

Wang

Tian

, et al.

GhostNet: more featu-res from cheap operations

IEEE/CVF conference on computer vision and pattern recognition (CVPR), Seattle, WA, 13–19 June 2020: IEEE, pp. 1577–1586.

32.

Shen

Sun

. Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Salt Lake City, 18 June 2018, IEEE, pp. 7132–7141.

33.

Hou

Zhou

Feng

. Coordinate attention for efficient mobile network design[C]. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Nashville, TN, 20–25 June 2021, IEEE, pp. 13713–13722.

Research on multi label feature recognition of bobbin and yarn based on YoloColor-Net

Abstract

Keywords

Introduction

Proposed model

Yarn color recognition module

Bobbin shape recognition module lightweight

Add attention mechanism ICBAM

Improvement of channel attention module

Improvement of spatial attention

Experiments and discussion

Construction of experimental samples

Color recognition module experiment

Detection accuracy

Computational cost analysis

Conclusions

Footnotes

Declaration of conflicting interests

Funding

ORCID iDs

References