Abstract
An auto fabric defect detection system via computer vision is used to replace manual inspection. In this paper, we propose a hardware accelerated algorithm based on a small-scale over-completed dictionary (SSOCD) via sparse coding (SC) method, which is realized on a parallel hardware platform (TMS320C6678). In order to reduce computation, the image patches projections in the training SSOCD are taken as features and the proposed features are more robust, and exhibit obvious advantages in detection results and computational cost. Furthermore, we introduce detection ratio and false ratio in order to measure the performance and reliability of the hardware accelerated algorithm. The experiments show that the proposed algorithm can run with high parallel efficiency and that the detection speed meets the real-time requirements of industrial inspection.
1. Introduction
With the rapid development of image recognition and hardware platforms, numerous fabric defect detection algorithms have been proposed in recent years. One of these is the grey-level co-occurrence matrix (GLCM) [1, 2, 3], which describes the second-order texture feature of images. However, the GLCM is unable to distinguish defects from images properly, due to its excessive matrix dimensions. Furthermore, some multi-resolution (multi-scale) analysis methods such as adaptive wavelet transforms (WT) [4] and Gabor transform [5] have been introduced to detect local defects by providing local resolution in horizontal, vertical and diagonal directions. In [7], independent components of fabric with normal textures that were different from the independent components of defective textures were obtained by resorting to independent component analysis (ICA), while the realization of the detection was based on the difference between the independent components. In addition, the combination of WT and ICA [8], Markov random fields [9] and SC [10, 11] were also applied to fabric defect detection and achieved good detection performance. However, many challenges remain such as the limits of computation and the requirements of detection ratio, which should be overcome when applying these algorithms in real industrial inspections. In this paper, we propose a fabric defect detection algorithm based on improved SC that can train the SSOCD; additionally, image patches' projection in the training SSOCD is taken as detection features, as these have obvious advantages for detection results and computational cost. Prior to the training process, the Gabor filter is used to preprocess fabric images based on classic frequency-domain defect detection algorithms, which can reduce the complexity of reconstruction and the influence of noise. The SSOCD is trained after completion of the above preprocessing.
Multi-core parallel hardware architecture is more suitable for computing intensive applications compared to conventional processors. TMS320C6678, which is a type of digital signal processor (DSP) with eight cores and integrated on a chip was recently developed by Texas Instruments. We employed this signal processor due to its outstanding fixed/float processing performance, optimized KeyStone multi-core architecture, flexible expandability and its capability for drawing from the strengths of eight cores to improve the overall performance. In this paper, we propose a parallel acceleration algorithm, based on a feature split for DSP to accelerate SSOCD extraction. Meanwhile, the detection system optimizes the type of synchronization and communication between different cores, which results in a reduction of the algorithm's serial time-usage, enabling the system to reach a state with a good parallel speedup ratio and parallel efficiency. Moreover, by optimizing the DSP program to take full advantage of parallel processing performance, the algorithm's processing time is greatly reduced. Finally, our analysis shows that the proposed algorithm can run with high parallel efficiency and the detection speed meets the requirements of industrial inspection.
In a real industrial system, cameras capture images from the fabric production channel and a fabric defect detection model automatically recognizes and alerts with regard to fabric defects. In order to achieve auto fabric defect detection via computer vision, our detection application is summarized as (1) acquiring fabric images in an industrial environment; (2) training the SSOCD via SC using a serious of fabric images without defects; (3) computing the features of fabric images to be detected; (4) judging and recognizing the defect.
The rest of this paper is structured as follows. We first describe fabric defect detection flow via SC and provide an overview of the SSOCD extraction in Section 2, then introduce a parallel hardware acceleration method for defect detection in Section 3. Section 4 analyses the experimental results and shows that our algorithm has superior robustness. We conclude this paper in Section 5 from a viewpoint of industrial application.
2. SSOCD of SC for Fabric Defect Detection Overview
The main problem in traditional fabric defect detection methods based on WT is deficiencies in standardization. Thus, a serious of training methods such as ICA [7, 8] and SC [10,11], which have the ability to flexibly receive effective descriptors by means of training fabric samples are proposed to detect fabric texture features. In this paper, we chose SC for our fabric defect detection, because its efficient feature extraction scheme is more suitable for parallel hardware acceleration.
In the scenario of fabric defect detection via SC, current hardware levels find it difficult to meet the demand for industrial real-time detecting, due to the significant computational resources needed for acquiring sparse expression. We therefore propose SSOCD for defect detection concerning the algorithm. In order to do so, three major steps need to be completed. Below, we provide a brief explanation of the flow of fabric defect detection via SSOCD.
2.1. Training SSOCD of SC
The process of training SSOCD of SC simulates the human visual cortex in terms of information processing [12]. And SSOCD of SC reflects to some degree the most basic elements of an image. Assume that
where

Overview of SSOCD training
2.2. Feature Extraction
Feature selection is an important part of the detection process [10,11] used to reconstruct errors and is achieved by calculating coding coefficients as features to detect defects once the dictionary has been trained. The reconstruction error is calculated as follows:
where
We introduce every patch's projection in the training SSOCD as the detection feature of an image. For an image patch
The process of extracting projection features is shown in Fig.2. We can see that the process of preprocessing a fabric image to be detected is similar to preprocessing in the training process and the last projection feature of this fabric image,

The process of feature extraction via projection

The process of feature extraction via projection
Sparse expression can be computed in a situation where there is little reconstruction error from fabric with a normal texture, because SSOCD is also trained from fabric with normal texture. For a SSOCD vector, if its corresponding sparse expression value is not zero, the value of normal image patch projection in this vector should be large and the value of an image patch with defects should be small. In other words, although the projection feature
Fig.3 shows that these three features (
is calculated by Eq. 6 constructs the feature images of
2.3. Fabric Detection
In this paper, the average
In Eq. 7,
Eq. 8 decides whether an image is recognized as a defect image or not, where V is the distance between the image to be detected and the trained image,

The process of detection based on feature
3. Parallel Hardware Acceleration for Fabric Defect Detection via SC
TMS320C6678 with an eight-core DSP has a suitable hardware architecture for parallel acceleration of fabric detection. Fig.5 summarizes the actual data processing procedure of fabric defect detection via SSOCD. Three important modules are shown in Fig.5, i.e., Gabor filtering preprocessing, feature extraction and mean filtering processing; these can be converted to the operation of image template convolution. The final size of the Gabor filtering convolution template is 5*5, the mean filtering convolution template is 16*16 and the final scale K and dimension N of SSOCD is 16 and 30, respectively. We propose a parallel acceleration architecture, which is based on feature splits according to the characteristics of detection algorithms.

The parallel acceleration flow of defect detection
3.1. Hardware Acceleration Architecture Based on Feature Split
The processing flow of this architecture is associated with the allocation of eight cores. As shown in Fig.6, we can see that parallel acceleration flow is based on a feature split. First, when the Gabor filter preprocesses an image acquired from an industry camera, the image to be detected is divided into eight equal image blocks and each image block is assigned to a core for processing. For each image block, it is necessary to keep overlapping pixels on the edge of an image block for the purpose of eliminating the impact of invalid data produced by convolution filtering. Additionally, the eight preprocessed image blocks are synchronously assembled into one complete image. Furthermore, the procedure of feature extraction from the detection image is the core step and K, which represents the number of SSOCD, determines the dimensions of the image features and the detection quality; therefore, we propose the architecture via a feature split in order to accelerate detection by assigning the procedure of K dimensional feature extraction to each core. Finally, a result can be obtained by judging the distance between detected images and trained images.

The parallel acceleration flow of defect detection
3.2. Hardware Acceleration at the Code Level
For fabric defect detection via SSOCD, computation primarily depends on mask convolution. The essence of mask convolution lies in multiply-add operation between mask coefficients and fabric images to be detected. If we use the two-dimensional convolution method to calculate mask convolution, it causes low efficiency of compilation, a long time using registers and low parallelism of command and data. Particularly when the column of an image increases, row address of image data has a wide range of jump and decreases the read/write hit ratio of caching, resulting in an increase of CPU resources required.
To decrease time consumption, a one-dimensional local convolution method that can improve the efficiency of compilation is introduced. When using, for example, a Gabor convolution with a 5*5 template for a pixel in a certain row of an image, its local convolution result is calculated by multiplying 5 pixels' grey value over the pixel with their corresponding Gabor convolution template coefficients; we can then get the local convolution results of a specific row and add them to a cache array. Next, the same operation is executed four times and we can get the Gabor convolution results of a certain row from the cache array. This method effectively accelerates our algorithm on the hardware platform as a result of increasing the read/write hit ratio of the cache.
4. Experimental Results Analysis
During the procedure of fabric detection, we applied the TILDA public database of twill as test images. In this section, we discuss the relationship between the detection ratio and algorithm parameters. Following on, we compare our proposed method with other methods using the detection ratio as an evaluation criterion. Finally, an analysis of hardware acceleration is provided.
4.1. Relationship between Detection Ratio and Algorithm Parameters
The size of image patches is an important parameter in our algorithm; if it is too small, the detection result will easily be affected by noise through, for example, illumination changing, fabric grain fluctuating, etc. As shown in Fig.7, when an image patch that consists of 4*3 pixels is used as an element, the detection ratio is always below 90%, regardless of dictionary scale. Additionally, we can see that the size of image patches should be appropriate in case the scale of SSOCD remains fixed; the scale of SSOCD should also be appropriate in case image patch size remains fixed in order to obtain a high detection ratio and low false ratio (see Fig.7). Although large image patch size or SSOCD scale can improve detection ratio and decrease false ratio, it gives rise large calculations. A larger image patch size does not necessarily provide a better detection ratio. Detection ratio can exceed 90% in conditions where the scale of SSOCD is greater than 24 and when the image patch consists of 12*10 pixels. However, image patches that consist of 6*5 or 9*7 pixels can also result in a detection ratio above 90% with a small dictionary scale, because 6*5 and 9*7 is near to the period of our experimental twill samples. Therefore, we can achieve an acceptable detection ratio by applying 6*5 or 9*7 as the size of image patches and 16 as the dictionary scale.

Show of detection ratio and false ratio
The most suitable size for image patches was received as being near to the period of our fabric. Generally, we set the scale of the dictionary at slightly more than half the size of image patches. Detection results via SC are shown in Fig.8.

Detection result of twill using SC
4.2. Comparison with Other Detection Methods
In our experiments based on the TILDA database, the method via SC was compared with the following methods in terms of detection performance: 1) ICA: the realization of fabric defect detection via ICA by replacing ICA with SC in the process of training and projection was tested, and the same dimension

Comparison of detection ratio
4.3. Parallel Hardware Accelerating for SC Detection Analysis
4.3.1. The Evaluation Criterion for Parallel Algorithm Performance
1. Speedup
An important indicator for measuring whether a parallel algorithm is a good fit is to compare a task's execution time in a single-core processor, with the processor having N parallel cores. The speedup of a processor having N parallel cores is defined as follows:
where
2. Parallel efficiency
Parallel efficiency is the ratio of speedup to the number of cores and can be defined as follows:
In an ideal situation, the maximum value of speedup is N and the maximum value of parallel efficiency can reach 1. We can evaluate the performance of the parallel algorithm and further improvement thereof from the above two evaluation standards.
4.3.2. Experimental Results and Analysis
In the TILDA public database of twill, the size of the image is 512*768, the convolution template of Gabor filtering is 5*5, the final scale K of SSOCD is 16 and the size of an image patch is 6*5. Firstly, a single-core serial detection algorithm is implemented using a Matlab inter core PC platform with 2 Duo CPU, E7300 @2.66GHz and 2.00GB memory. Furthermore, the detection algorithm is transplanted to the evaluation board of TMS320C6678 with single-core serial and multi-core parallel implementation and optimization. Finally, we analysed the performance of our detection algorithm via experimental results.
From Table 1, we can see that the execution time for the PC platform was 58 times as long as on the single core of TMS320C6678 for the performance of floating-point calculation; this was due to the application of the hardware acceleration algorithm and the powerful computing capacity of DSP. Additionally, the execution time of the single-core fixed-point calculation was less than single-core floating-point calculation on the TMS320C6678 platform.
Performance test on different platform environments
As shown in Table 2, under the fixed-point environment of DSP, we tested consuming time by detecting a fabric image using different numbers of cores. Fig.10 shows that the speedup increased continually as more cores were added and parallel efficiency decreased as more cores were added. Due to bus competition between different cores, parallel efficiency and speedup could not reach their ideal maximum. When all eight cores were utilized, the average time consumption was 101.057ms, parallel efficiency was 88.18% and speedup was able to reach up to 7.054. Therefore, the algorithm presented in this paper is extremely suitable for parallel detection of fabric defects.
Consuming times for the different number of cores

The left plot shows the relationship between speedup and the number of cores. On the right is a plot for parallel efficiency for the different number of cores.
5. Conclusion
In this paper, we presented the detailed design and implementation of a parallel accelerating algorithm via SC for fabric defect detection. Firstly, a feature split was utilized to accelerate SSOCD extraction in order to meet real-time detection requirements. Secondly, by incorporating an optimized synchronization and communication method, which was able to reduce the algorithm's serial time-consuming, the system reached a state with a good parallel speedup ratio and parallel efficiency. Moreover, by optimizing the DSP program to make full use of parallel processing performance, the algorithm's processing time was significantly reduced. The experiments conducted show that the proposed algorithm can run with high parallel efficiency and that the detection speed meets the requirements of industrial inspection.
