Sage Journals: Discover world-class research

Abstract

Fault-influencing factors analysis is an important part of the quality supervision process. There are double functions for high-voltage switchgears that switch off and protect electric circuits in power transmission lines. Such devices have serious impact on power grid–operating efficiency, factory operation, and resident life, which will cause economic losses. As it was difficult for traditional methods to analyze fault-influencing factors accurately and comprehensively, a novel method based on industrial big data was proposed to analyze high-voltage switchgears fault-influencing factors in the process of quality supervision in this article, which integrated the qualitative and quantitative analyses method. In this model, the Classification Based on Multiple Class-Association Rules based on Gaussian Mixture Model as the qualitative analysis method was adapted to analyze the whole life cycle of fault-influencing factors of high-voltage switchgears comprehensively, and supplied fault-influencing factors with discrete interval value ranges. The logistic regression method based on qualitative analysis was constructed to calculate fault occurrence probability quantitatively, including the single-fault occurrence probability and the multiple-faults joint occurrence probability. In addition, the single-fault occurrence probability was used to modify the discrete interval value ranges calculated by the qualitative analysis method, which could make the ranges more accurately. Consequently, the proposed method could provide important reference for high-voltage switchgears operation maintenance, and it would be possible to design accurate maintenance plans before equipment failure. The final instance demonstrates the effectiveness of the proposed methodology.

Keywords

Industrial big data equipment quality supervision qualitative and quantitative analyses integrate fault-influencing factors analysis model quantitative fault-influencing factors analysis model

Introduction

High-voltage switchgears can not only close and disconnect normal working current but also promptly cut off overload current and short-circuit current in line failure to guarantee power grid safe operation. Due to a variety of complex and potential factors, faults of high-voltage switchgears have occurred frequently, such as the control loop fault, the mechanical fault, and the power supply system fault, which seriously threaten power grid safe operation, so fault-influencing factors analysis of high-voltage switchgears plays an important role in the entire power grid–operating efficiency, operation of factories, and resident life. At present, the production process of high-voltage switchgears and the product itself are becoming more and more intelligent, and the whole life cycle of high-voltage switchgears produces a large number of heterogeneous data, including design and production parameters, test detection parameters, environment parameters, and operation parameters during operation, and regular spot inspection data during maintenance, which lead to a more complicated analysis of fault-influencing factors. Fault-influencing factors analysis of high-voltage switchgears has the following characteristics: (1) Wide range of fault-influencing factors, operation data come from regional substations; design data, production data, and test data come from design, production, testing, and other enterprises. (2) Diversity of fault-influencing factors, the process of fault-influencing factors analysis has the characteristics of diverse natural environment, diverse equipment parameters, diverse data types, and so on. (3) Qualitative analysis dominated by artificial experience is difficult to analyze and verdict faults accurately. Therefore, it is necessary to start from the whole life cycle big data analysis of high-voltage switchgears to explore a quantitative analysis method of fault-influencing factors and obtain quantitative relationship between parameter change of fault-influencing factors and fault occurrence probability, which supports preventive maintenance and reduces fault frequency and harm degree in the operation process. It is of great practical significance to provide high-quality and safe electric power protection for a rapidly developing national economy.

A lot of data-driven methods have been proposed for high-voltage switchgears fault-influencing factors analysis. Ni et al.¹ proposed an approach based on an adaptive kernel principal components analysis (PCA) and support vector machine for real-time fault diagnosis of high-voltage circuit breakers (HVCBs), this approach achieved balance between real time and accuracy, could also deal with various situations of fault diagnosis for HVCBs, and was capable of detecting and recognizing the faults efficiently. To perform online fault diagnosis of transmission lines, Zhao et al.² proposed a hybrid method combining stochastic time domain simulation and history-driven differential evolution to generate simulated fault and system data so as to improve the computational speed of fault diagnosis and handle the possible malfunction of protective relays and circuit breakers. Wang et al.³ developed a method based on Fuzzy Reasoning Spiking Neural P systems, Fuzzy Reasoning Spiking Neural P systems to handle incompleteness and uncertainty in power transmission network fault diagnosis in electric power systems; this approach provided intuitive illustration of graphical models and understandability of diagnosis model-building process. Groenewald and Aldrich⁴ proposed a method by means of causality maps and extreme learning machines for root cause analysis to deal with highly nonlinear systems. Tian et al.⁵ presented a fault diagnosis method based on current kernel density estimation for transistor open-circuit fault using current kernel density estimation, Euclidean distance, fault detection, and isolation to analyze the influence factors of thermic cycling and voltage surge. Leone et al.⁶ applied a data-driven prognostic algorithm for the estimation of the remaining useful life of a product, which played an important role in preventive maintenance of the medium-voltage circuit breakers and HCVBs. Then, Zhang et al.⁷ and Huang et al.⁸ introduced a mechanical fault diagnosis method for HVCBs based on variational mode decomposition and support vector machine, and time segmentation energy entropy was extracted to construct feature vectors for describing the energy distribution of HVCB vibration signals in the time and frequency domain. Zhu et al.⁹ proposed a method based on particle swarm optimization-support vector domain description and particle swarm optimization kernel-based Fuzzy C-means for adaptive fault diagnosis of HVCBs. Ramentol et al.¹⁰ presented an imbalanced learning preprocessing algorithm, called SMOTE-FRST-2T, which combined the well-known synthetic minority oversampling technique with a strategy of instance selection based on Fuzzy Rough Set Theory to predict the need of maintenance. Lin et al.¹¹ constructed a hierarchical assessment index system based on intelligent electronic device monitoring data and proposed a method which integrates Fuzzy Set Theory to analyze the circuit breaker condition assessment indices provided by the intelligent electronic devices. To improve the identification accuracy of HVCBs mechanical fault types without training samples, Wang et al.¹² introduced a mechanical fluctuation diagnosis method of Bayes using a hybrid classifier constructed with support vector data. Razi-Kazemi et al.¹³ developed diagnosis framework for online monitoring system to accurately assess its condition while under operation, and to foresee any risk of failure, used a data-mining process to cluster the captured data against past-recorded data of circuit breaker diagnosis. Luo et al.¹⁴ applied the wireless instrumented milling cutter system with embedded polyvinylidene difluoride (PVDF) sensors to protect the milling from loopholes and different types of faults. Rawat et al.¹⁵ proposed a multilayered fault estimation classifier, based on the dominance-based rough set to protect the system from vulnerabilities and different kinds of faults and provided a robust system for fault diagnosis using the status of an intelligent electronic device and circuit breakers which could be tripped by any kind of fault. Ksal and Batmaz Testik¹⁶ published a review of data-mining applications for quality improvement in manufacturing industry, whereas Zhang et al.¹⁷ and Chang et al.¹⁸ presented a big data analysis architecture for the entire lifecycle of the product without deep discussion of the mentioned analysis techniques. Based on supervisory control and data acquisition control systems, Astolfi et al.¹⁹ and Simons and Cheung²⁰ studied the data-mining algorithm to analyze the relationship between fault and influence factors in the process of product operation; the performance and the validity of the method was proved through the production operation example. Cai et al.²¹ analyzed and studied the fault correlation of high-voltage switchgears, integrated data mining, Fuzzy Set Theory, Bayesian Network, expert comprehensive judgment, and other data analysis methods for relevant reliability index and influence factors to provide support for the inspection and maintenance of HVCBs. To apply the massive manufacturing quality data effectively to the quality analysis of the manufacture enterprise, He et al.²² proposed a generalized health prognosis method for cylinder-head manufacturing system, which is presented based on the deep fusion of quality-oriented big data of operational process of manufacturing systems, and according to a wide range of production operations and statistical analysis, He et al.²³ also presented the mapping relationship between the produced product reliability and mission reliability of the manufacturing system to build the mission reliability model. Based on the real-time management of data and information, Menon et al.²⁴ described the product (not for high-voltage switchgears) life cycle management challenges in detail from the existing literature and presented solutions using industrial Internet platform openness and related dimensions as well as sub-dimensions. These methods have played a significant role in fault analysis, factor identification, maintenance, and life prediction of high-voltage switchgears. However, most of these studies mainly focus on the fault analysis in the process of equipment operation, and the influence factors analysis does not cover all phases of the product life cycle and cannot explore the root cause of fault. At the same time, most of these studies are unable to make use of the massive data from the whole life cycle of the product, fault-influencing factors analysis in the massive data is insufficient to meet the requirements of the fault-influencing factors analysis under the massive data. In addition, most of these methods are based on qualitative analysis, quantitative analysis is rare and cannot supply quantitative data. Considering that high-voltage switchgears are in highly intelligent complex operation condition, fault-influencing factors analysis can’t solve these problems effectively and makes equipment maintenance have some difficulties and certain limitations.

Accordingly, the industrial big data analysis method was introduced into fault-influencing factors analysis of high-voltage switchgears in this article and the fault-influencing factors set was constructed. A method integrated Classification Based on Multiple Class-Association Rules (CMAR) based on Gaussian Mixture Model (GMM) and logistic regression was proposed, where qualitative analysis and quantitative analysis were considered. In the proposed method, association rules between fault-influencing factors and fault based on CMAR were determined to obtain discrete interval value ranges of different fault-influencing factors for different fault types. On the basis of qualitative analysis, the model based on logistic regression was presented to calculate the single-fault occurrence probability used to modify discrete interval value ranges and the multiple-faults joint occurrence probability. According to precise ranges, maintenance strategies can be developed in advance to support preventive maintenance during the operation of high-voltage switchgears.

The rest of this article is organized as follows. The “Method of high-voltage switchgears fault-influencing factors analysis based on industrial big data” section introduces the high-voltage switchgears fault-influencing factors analysis model based on industrial big data, where qualitative and quantitative analyses based on CMAR and logistic regression are proposed. In the section “Application case of high-voltage switchgears fault-influencing factors analysis,” the proposed method is validated by instance and a big data analysis with parallel computing platform is built to demonstrate its effectiveness. Finally, the conclusion are given in the “Conclusion” section.

Method of high-voltage switchgearsfault-influencing factors analysis based on industrial big data

To analyze fault-influencing factors of high-voltage switchgears, the method which integrates qualitative and quantitative analyses based on CMAR and logistic regression, respectively, is proposed. This method offers the specific discrete interval value ranges of different fault-influencing factors, the single-fault occurrence probability, and can also provide the multiple-faults joint occurrence probability.

The main procedure of the proposed method is illustrated in Figure 1. The entire process can be divided into a qualitative stage and a quantitative stage on the basis of CMAR and logistic regression, respectively. The four key steps of the proposed method are as follows:

Data preparation: collect the whole life cycle fault data of high-voltage switchgears as the fault data set, the data set of fault-influencing factors analysis is built through data preprocessing and continuous data discretization.

Qualitative analysis: CMAR based on GMM as the qualitative analysis method is proposed to get association rules between faults and fault-influencing factors and can also get associated discrete interval value ranges.

Quantitative analysis: logistic regression, the quantitative analysis method, is proposed to obtain the single-fault occurrence probability, the data set of this method is the associated discrete interval value ranges obtained by qualitative analysis.

Intervals modification: modify the discrete intervals value ranges obtained by qualitative analysis using the single-fault occurrence probability and obtain the multiple-faults joint occurrence probability.

Figure 1.

Method of qualitative and quantitative analyses based on CMAR and logistic regression.

Data resource processing and qualitative analysis based on GMM and CMAR

The fault data of high-voltage switchgears consist of continuous and discrete data. Before constructing a qualitative analysis model, all continuous attributes in the fault data need to be discretized. The CMAR method based on GMM is used to process data and supply data for qualitative and quantitative analyses.

1. Discretization of continuous attributes based on GMM.

Assume that there is a group of fault-influencing factors, given observation is $X = {x_{1}, x_{2}, \dots, x_{i}} (i = 1, 2, \dots, n)$ and the distribution of X is composed of K Gaussian distributions, which can represent the number of sample points in the process of clustering. $π_{k}$ is the fault-influencing factor of each Gaussian distribution on the data points, $θ_{k}$ is the parameter of each single Gaussian distribution, and $Θ$ is the parameter space. GMM is defined as shown in equation (1), where $p_{k} (x | θ_{k})$ represents the probability density function of the kth single Gaussian distribution, and $N (x | μ_{k}, σ_{k})$ is the probability density of the kth single Gaussian distribution as shown in equation (2), $μ_{k}$ is its mean, $σ_{k}$ is its standard deviation²⁵

p (X | Θ) = \sum_{k = 1}^{K} π_{k} p_{k} (x | θ_{k})

(1)

p_{k} (x | θ_{k}) = N (x | μ_{k}, σ_{k}) = \frac{1}{\sqrt{2 π σ}} \exp {- \frac{{(x - μ_{k})}^{2}}{2 σ_{k}^{2}}}

(2)

In these two equations, $π_{k} \in [0, 1], \sum_{k = 1}^{K} Θ = {π_{1}, \dots, π_{k}, Θ_{1}, \dots, Θ_{k}}$

The logarithm likelihood function $L_{Θ}$ of the GMM is as shown in equation (3)

L_{Θ} = \log Π_{i = 1}^{N} p (X | Θ) = \sum_{i = 1}^{N} \log {\sum_{k = 1}^{K} π_{k} N (x_{i} | μ_{k}, σ_{k})}

(3)

To achieve discretization of continuous attributes, the GMM is fitted by $π_{k}$ , and then the GMM is used to cluster data into different single Gaussian models.

2. Determination of the Optimal GMM by Bayesian Information Criterion

Typically, the number of mixed components of the GMM is determined in advance. A too great value of K means that the number of mixed components is large, then the error of the model fitting will be very small; however, the noise in the data can possibly lead to overfitting, whereas a small K will lead to underfitting, which cannot capture the underlying data trend.

Bayesian Information Criterion (BIC), developed by Bayesian theory, is a criterion for model selection. The main principle is to find the best evaluation criterion by means of model complexity and the ability of data interpretation. BIC is formally defined in equation (4)

BIC = \ln L_{Θ} - k \ln n

(4)

where k represents the number of free parameters in the model space, which is composed of the parameter space $Θ$ , $L_{Θ}$ is the logarithm likelihood function of the model, n is the sample size, and k ln n is a penalty term for the complexity of the model. The Bayesian criterion optimal model is the maximum value of BIC.²⁶

Continuous attributes fault data intervals are discretized by single variable. Assume that the continuous observation in fault data is $X = {x_{1}, x_{2}, \dots, x_{i}} (i = 1, 2, \dots, n)$ , the GMM with [1, m] K Gaussian components is calculated, respectively, where m represents the maximum number of components to be calculated. Then, the optimal GMM is selected from GMMs in accordance with the BIC value, that is, the GMM with the maximum BIC value is selected as the model of continuous attributes discretization.

3. Construction of qualitative analysis model based on CMAR

The CMAR algorithm²⁷ is gradually developed on the basis of the classification-based association (CBA) method.²⁸ The CMAR algorithm and CBA algorithm are based on the Association Rules algorithm. CMAR is the transformation method of frequent pattern (FP)-growth that evolved from CBA, which can mine association rules sets satisfying the minimum support and minimum confidence, and several strong association rules are used to determine the class label of the new sample. Different from the traditional association analysis method, the classification method based on the association rules requires that the regular pattern be classified when mining frequent patterns, and such rules are called Class-Association Rules (CARs). CMAR is the classification method to mine a set of CARs.²⁹ The difference between CMAR and FP-growth is that FP-growth belongs to the Association Rule algorithm, and CMAR belongs to the classification algorithm. CMAR is a method to mine strong associations between the frequent attribute set and the class label based on the training sample, CMAR has a stricter requirement for the FP-growth construction process.

Main reasons for adopting CMAR in the qualitative model are the following points:

The fault data resource has the fault class label, which meets the need of Supervised Learning. This method is more suitable compared with simply using the Association Rules method for the scenario discussed in this article and switch equipment data.

The classifier obtained by CMAR is a descriptive model with the rule class and easy to understand. It is suitable for fault-influencing factors qualitative analysis and can be used to construct the rule-based qualitative analysis model.

CMAR combined with FP-growth is very suitable for fault-influencing factors qualitative analysis compared with the traditional CBA method, which greatly improves the running efficiency and adapts to the large-scale data set.

As shown in Figure 2, the analysis based on CMAR is roughly divided into two parts, first, mining strong relationship between frequent attribute sets and class labels according to high-voltage switchgears training samples, association rule sets based on fault classes are generated and stored in the CR-Tree (Class Rules Tree), then, the rules that conform to constraint conditions are selected on the basis of the association rules and sorted by confidence and lift, thereby, the qualitative fault-influencing factors are obtained.

Figure 2.

Fault-influencing factors qualitative analysis based on CMAR.

CMAR can be used to deal with the high-voltage switchgears data more effectively, which can improve the effectiveness of fault-influencing factors analysis. Because of its nature of supervised learning, rule sets contained in the classifier based on CMAR can fully excavate discrete interval value ranges, so the larger the data set, the more accurate the mining result will be.

The calculation process is as follows:

Browse the data set of fault-influencing factors, find out the “attribute-classification” set F which satisfies the given minimum support (minsup), as shown in Table 1.

Sort F in descending order by support count, the highest frequency “attribute-classification” pair appears first in the rule sets.

Browse the data set S for a second time and establish the CR-Tree with class labels. For each transaction T_k in S, put each “attribute-classification” pair corresponds to F in the CR-Tree and insert the class label of T_k while the last “attribute-classification” pair is inserted.

The frequent patterns are excavated from the bottom to the top. In the cyclic process, classifying the class condition of the node to the parent node, which is similar to the generation of frequent patterns in the FP-Tree, the frequent patterns with the class labels are finally obtained.²⁷

Table 1.

Frequent rules set F of “attribute and classification.”

TID	Rule	Support	Confidence
1	$abc \to A$	80	80%
2	$abcd \to A$	63	90%
3	$abce \to B$	36	60%
4	$bcd \to C$	210	70%

TID: transaction identifier.

Fault-influencing factors quantitative analysis of the single-fault occurrence probability and the multiple-faults joint occurrence probability based on logistic regression

On the basis of qualitative analysis, the quantitative analysis model is built for occurrence probability to get the quantitative relationship between fault and fault-influencing factors, including the construction of quantitative data set, the quantitative analysis based on logistic regression,³⁰ and the regression evaluation.

Build a quantitative data set. Filter samples and build the quantitative data set for each fault with reference to discrete interval value ranges.

Construct a quantitative analysis model based on logistic regression. Compared with the traditional linear regression with continuous dependent variables, the dependent variable of logistic regression is a discrete variable, so it is a generalized linear regression, and there are many similarities between logistic regression and multiple linear regression analysis. Their model forms are essentially the same with W^TX + b, where W is a parameter matrix with coefficients to be evaluated, and b is a constant parameter vector. For logistic regression, the function L (logistic function) corresponds W^TX + b to a state p, where p = L(W^TX + b), and determines the value of dependent variables according to the number of p and 1 –p. The relationship between different parameters and the causes of defects are obtained by regression analysis, and the quantitative relation is described by the logistic regression equation in the range of certain confidence intervals.

The analysis steps of the application of the logistic regression algorithm to get the single-fault occurrence probability and the multiple-faults joint occurrence probability are as follows:

Step 1: Data cleaning, data transformation, and data reduction of arranged data, the main contents are as follows:

In the case of missing data, samples with serious missing data in the main dimensions will be removed. When the missing data are not serious, the mean method and mode method can be used to complete the data.

In the case of possible outliers, the probability model of the data distribution is assumed beforehand based on the characteristics of the data set, and then the exception is determined according to the inconsistency of the model on a significant level, which is directly deleted for the determined exception value.

For eliminating differences caused by different dimensions of the continuous attributes data, it is necessary to reduce and normalize the data to the same scale.

Step 2: Determine independent variables and dependent variables of the regression model according to data conditions.

First, determine the dependent variable, and the dependent variable is the fault type according to the target.

Second, determine the independent variable, and all factors other than fault are listed as independent variables according to data conditions.

Step 3: The independent variables and dependent variables determined in step 2 are brought into the regression equation to establish the logistic regression model.

The fault type of high-voltage switchgears is represented by $Y, Y \in {1, 2, \dots, s}$ , where the numbers in the collection are not of number, not in order, and only used for fault classification. Suppose the number of independent variables is p, the sample number of each independent variable is s, and the sample independent variable can be represented by X, which is a $n^{*} p$ matrix, namely

X = (x_{1}, x_{2}, \dots, x_{p}) = (\begin{matrix} x_{11} & \dots & x_{1 p} \\ ⋮ & ⋱ & ⋮ \\ x_{n 1} & \dots & x_{np} \end{matrix})

The fault regression model is the logistic regression model as shown in equation (5)

{\begin{matrix} P (y = 1 | x) = \frac{e^{g (x)}}{1 + e^{g (x)}} \\ P (y = 0 | x) = \frac{1}{1 + e^{g (x)}} \end{matrix}

(5)

where $g (x) = α + β_{1} x_{1} + β_{2} x_{2} + \dots + β_{p} x_{p}, P (y = 1 | x)$ is a conditional probability, which represents the fault occurrence probability under the sample data x, and $P (y = 0 | x)$ represents the probability that the fault will not occur. $α$ is a constant in the regression, and $β$ is the weight coefficient vector in the regression equation.

Step 4: The maximum likelihood method is used to determine weight coefficients of each variable in the regression equation.

Because the dependent variable of logistic regression is fault type, and it is a discrete variable, the least squares estimation is no longer applicable, the maximum likelihood estimation is used to determine the regression parameters, and the logarithm likelihood function is defined as equation (6)

Ln (L (β)) = \sum_{i = 1}^{m} (y_{i} \ln [p (x_{i})]) + (1 - y_{i}) \ln [1 - P (x_{i})]

(6)

where m represents the number of data, P(x_i) represents the fault occurrence probability, and according to the principle of extremum in differential calculus, the loss function is equation (7). Taking the minimum value of the loss function and coefficients in this condition as the final regression coefficients, the fault occurrence probability of different fault-influencing factors can be obtained as

\begin{matrix} \cos t (β) = - \frac{1}{N} Ln (L (β)) = - \frac{1}{N} \sum_{i = 1}^{m} (y_{i} \ln [p (x_{i})]) \\ + (1 - y_{i}) \ln [1 - P (x_{i})] \end{matrix}

(7)

Step 5: Calculation of the multiple-faults joint occurrence probability.

Based on the quantitative analysis of single fault, a joint probabilistic model of multiple faults is designed. For the fault-influencing factor, X_i, A is the discrete interval data set of s kinds of fault Y₁, Y₂, …, Y_S obtained by qualitative analysis, and A′ is obtained by modifying A according to the single-fault occurrence probability calculated in step 4. Y₁, Y₂, …, Y_S represent the probability of fault Y₁, Y₂, …, Y_S that occur together, the joint occurrence probability is calculated as equation (8)

P_{Y_{1}, Y_{2}, \dots,} Y_{S} (y = 1 | x \in A') = \sum_{i = 1}^{s} \frac{n_{Y_{i}}}{n} P_{Y_{i}} (y = 1 | x \in A')

(8)

3. Analysis and evaluation of regression results

Evaluate the regression and determine the rationality of the result, and finally obtain fault-influencing factors and their weights. In this article, the likelihood ratio (LR) test method is used to test the overall significance of the model. The measured statistic LR is defined as equation (9)

LR = 2 (\ln L - \ln L_{0})

(9)

where ln L represents the logarithm likelihood function value of the maximum likelihood estimation for the logistic regression model, and ln L₀ represents the logarithm likelihood function value of the model only with estimated intercept. By calculating the statistical P value of LR, we can get the probability of the sample observation results when the hypothesis model is with intercept only. If the P value is very small, the occurrence probability of the original assumption is very small, so the original hypothesis according to the small probability principle is rejected, and the model fitting effect can be determined as significant. When the test results show that the model is significant, the variables of the regression model can be used to filter the fault-influencing factors.

Application case of high-voltage switchgears fault-influencing factors analysis

Data processing and qualitative analysis based on GMM and CMAR

1. Data processing based on GMM and CMAR

The method proposed in this article is applied to analyze fault-influencing factors of SF6 HVCBs. The mechanical fault of SF6 HVCBs accounted for 70%, the auxiliary and control loop fault accounted for 19%, the main circuit electrical fault accounted for 11%, and main faults of SF6 HVCBs are “operating mechanism fault” and “SF6 leakage.” According to the actual business situation, fault types of SF6 HVCBs are divided into five categories, namely, “other faults” (F1), “operating mechanism anomaly” (F2), “SF6 leakage” (F3), “auxiliary components damage” (F4), and “main components deterioration” (F5), where the “operating mechanism anomaly” indicates the abnormal mechanism and abnormal transmission system.

The SF6 HVCBs fault sample data set mainly comes from the record of the problem of breakers failure in multiple regions by a power supply company. Its information attributes of fault data samples are shown in Table 2, mainly composed of equipment parameter information, environment parameter information, operation parameter information, and fault type information, in which the number of discrete attributes is 7, and the number of continuous attributes is 4.

Table 2.

Information attributes of samples data.

Information type	No.	Factors	Unit	Variable types
Equipment parameters	1	Production company		Discrete variable (4C)
	2	Equipment type		Discrete variable (4C)
	3	Operating mechanism type		Discrete variable (3C)
	4	Mechanical endurance	Frequency	Discrete variable (2C)
Equipment parameters	5	Environment pollution level		Discrete variable (5C)
	6	Environment temperature	°C	Continuous variable
	7	Environment humidity	Percentage	Continuous variable
Operation parameters	8	Annual average load level		Discrete variable (4C)
	9	Operating mechanism frequency	Frequency	Continuous variable
	10	Current breaking frequency	Frequency	Continuous variable
	11	Use time	Year	Continuous variable
	12	Fault removal time	Hour	Continuous variable
Fault type		Fault type		Discrete variable

The fault sample data of high-voltage switchgears from region 1 are shown in Table 3, and all attributes are restricted in the constraint conditions according to the specific business conditions. For example, for the discrete variable “operating mechanism type,” its value can only be one of the three values “MecType1,”“MecType2,” and “MecType3”; for the continuous variable, “environment temperature,” the operating temperature of HVCBs should meet −40°C to 40°C according to the GB standard, so taking −45°C to 45°C as the range of this attribute, thus, not in this interval, are abnormal values. Value detection of all other attributes follows this method and eliminates these exceptions. The multiple interpolation method is used to calculate and interpolate the missing values of continuous attributes, and four complete data sets are generated after calculating for four times, and then the regression prediction method is used to fill continuous missing values. The discrete attributes are filled with the mode of attributes, for example, such as the missing value of “environment pollution level,” which is filled by the highest frequency d-level.

Table 3.

Fault sample data set 1 of high-voltage switchgears operation process (part).

No.	Productioncompany	Equipmenttype	Operating mechanismtype	Annualaverageload level	Environment pollutionlevel	Environment emperature	OperatingmechanismfrequencyABC	Currentbreakingfrequency	Usetime	Faulttype
1	ProCmp1	EquType1	MecType1	40%–60%	5	15	210	1	3488	SF6 leakage
2	ProCmp2	EquType2	MecType2	40%–60%	5	10	180	2	3312	Other faults
3	ProCmp1	EquType2	MecType2	40%–60%	5	25	170	1	5737	SF6 leakage
4	ProCmp2	EquType2	MecType3	40%–60%	5	25	328	1	5489	Operatingmechanismanomaly
…	…	…	…	40%–60%	4	26	123	11	4465	Auxiliarycomponentsdamage
10	ProCmp2	EquType2	MecType3	40%–60%	4	26	31	12	4661	Operatingmechanismanomaly
…	…	…	…	…	…	…	…	…	…	…
155	ProCmp2	EquType2	MecType2	40%–60%	4	26	81	3	4660	SF6 leakage

The “use time” of the equipment is obtained by subtracting the time of fault occurrence and the time of investment.

PCA is used for continuous attributes reduction. Continuous attributes are nos. 6, 7, 9, 10, 11, and 12, and the cumulative contribution rate of these four attributes, nos. 6, 9, 10, and 11, is 0.987, so these four attributes are selected as the main components and subset of continuous attributes reduction. For discrete attributes, the Kullback–Leibler divergence is used to reduce attributes; calculate the Kullback–Leibler divergence of all continuous attributes at first, and sort them in descending order; results are shown in Table 4, and the first two attributes are chosen as the subset of feature selection. The discrete attributes removed by attributes reduction are nos. 1, 2, 3, and 4, and nos. 5 and 8 as two discrete attributes were kept. The attributes in the data set after attributes reduction are nos. 6, 9, 10, 11, 5, and 8.

Table 4.

The Kullback–Leibler divergence sequence table of discrete attributes.

Attribute	No. 8(annualaverageload level)	No. 5(environmentpollutionlevel)	No. 1(productioncompany)	No. 2 (equipmenttype)	No. 3(operatingmechanismtype)	No. 4(mechanicalendurance)
Kullback–Leiblerdivergence value	0.72470	0.71032	0.59469	0.47863	0.29248	0.10664

The discretization of continuous attributes is clustered by GMM, and the number of mixed components of the GMM is $K \in [1 . m]$ . m is the maximum number of components to be calculated, and the value of this article is 15. The GMM with the largest BIC value is the optimal model, and the number of Gaussian mixture components in this model represents the number of continuous attribute clusters, that is, the number of discrete intervals. Take “environment temperature” as an example, Figure 3 shows the change of the BIC value when selecting the number of Gaussian components. When the number of Gaussian components is 4, the BIC value is the largest, so the GMM of “environment temperature” will be composed of four single Gaussian distributions. The GMM can be used to obtain the fitted density distribution of “environment temperature” and clustering intervals of each point, so this attribute is clustered into four discrete intervals. Figure 4 (left) shows the Gaussian mixture density curve of “environment temperature,”Figure 4 (right) shows the distribution of “environment temperature” in various clustering intervals. The final result of continuous attribute intervals is shown in Table 5.

Figure 3.

The BIC value of Gaussian Mixture Model of “environment temperature.”

Figure 4.

Density curve (left) and clustering samples (right) of “environment temperature.”

Table 5.

Continuous attributes discrete intervals partition table.

Attribute interval	Environment temperature	Operating mechanism frequency	Current breaking frequency	Use time
Interval 1	envir temp1: [–11, 6]	open num1: [10, 167]	open cur1: [0, 18]	use time1: [1382, 3403]
Interval 2	envir temp2: (6, 23]	open num2: (167, 324]	open cur2: (18, 36]	use time2: (3403, 5424]
Interval 3	envir temp3: (23, 40]	open num3: (324, 483]	open cur3: (36, 55]	use time3: (5424, 7445]

2. Qualitative analysis based on GMM and CMAR

Calculated by CMAR, the obtained results are shown in Table 6 with relatively high confidence and support, and lift is more than 1, we can see clearly the discrete interval value ranges of different fault-influencing factors.

Table 6.

Discrete interval value ranges of different fault-influencing factors for different fault types.

Fault type	Use time	Annual averageload level	Operating mechanismfrequency	Environment pollution level	Environmenttemperature	Current breaking frequency
SF6 leakage	(5424, 7445]	Above 80%	(167, 324]	5	(6, 23]	[0, 18]
Operating mechanismanomaly	(5424, 7445]	40%–60%	(324, 483]	4	(6, 23]	(36, 55]
Main components deterioration	[1382, 3403]	40%–60%	(167, 324]	4	[−11, 6]	(36, 55]
Auxiliary componentsdamage	(5424, 7445]	Above 80%	(167, 324]	4	(6, 23]	(36, 55]

Fault-influencing factors quantitative analysis of multiple-faults joint occurrence probability

1. Quantitative data sets

Quantitative data sets are filtered in sample space according to the discrete interval value ranges as shown in Table 6. For fault type “SF6 leakage,” the discrete interval value range of “current breaking frequency” is [0, 18], the number of its filtered data entries is 10; the discrete interval value range of “operating mechanism frequency” is [167, 324], the number of its filtered data entries is 9; the discrete interval value range of “use time” is [5424, 7445], the number of its filtered data entries is 14; whereas “annual average load level” is above 80%, the number of its filtered data entries is 8; “environment pollution level” is 5, the number of its filtered data entries is 5; and when the discrete interval value range of “environment temperature” is [6, 23], the number of its filtered data entries is 12; and the total number of filtered data entries for “SF6 leakage” is 58. The method of quantitative data screening for other fault types is the same, and finally get 473 entries of quantitative analysis sample data, as shown in Table 7.

Table 7.

Quantitative analysis data set (part).

Fault type	Discrete interval ofinfluencing factors	Quantitative analysis sample
Fault type	Discrete interval ofinfluencing factors	Currentbreaking frequency	Operating mechanism frequency	Usetime	Annualaverageload level	Environment pollution level	Environment temperature
SF6 leakage	Current breakingfrequency [0, 18]	0	366	5918	Above 80%	5	25
		0	368	6261	Below 40%	4	17
		2	250	5726	Above 80%	4	17
		3	15	3070	40%–60%	1	19
		3	200	2960	Above 80%	5	17
		4	368	5824	40%–60%	4	3
	…	…	…	…	…	…	…
Operating mechanismanomaly	Operating mechanismfrequency (341, 483]	50	432	5980	Below 40%	4	30
		50	432	3216	40%–60%	2	0
		48	432	6191	Below 40%	4	26
		50	445	5918	40%–60%	4	30
		2	445	2961	40%–60%	2	17
		10	445	4171	Below 40%	1	26
	…	…	…	…	…	…	…
Main componentsdeterioration	Use time (1382, 3403]	43	189	1489	Below 40%	4	8
		20	57	1559	40%–60%	5	25
		0	210	1880	Above 80%	2	–6
		45	222	2071	40%–60%	4	26
		0	180	2285	Below 40%	5	1
	…	…	…	…	…	…	…
Auxiliary componentsdamage	Annual average load above 80%	42	56	5743	Above 80%	4	17
		42	125	5388	Above 80%	5	25
		40	139	5274	Above 80%	3	25
	…	…	…	…	…	…	…

2. Fault-influencing factors quantitative analysis of the multiple-faults joint occurrence probability based on logistic regression

There are five types of fault in this instance: “operating mechanism anomaly,”“auxiliary components damage,”“SF6 leakage,”“main components deterioration,” and “other faults.” When there is 1 fault occurrence, and 0 no fault occurrence, as shown in equation (10)

y = {\begin{matrix} 1 Fault Occurs \\ 0 Fault Does Not Occur \end{matrix}

(10)

Define $x_{1}, x_{2}, \dots, x_{6}$ as independent variables, p = 6, which represents these six fault-influencing factors, and independent variables of samples can be expressed in X, that is

X = (x_{1}, x_{2}, \dots, x_{6})

(11)

y represents dependent variables, which is fault; independent variables are shown in Table 8.

Table 8.

Independent variables mapping table.

Independent variable mark	Influencing-factors nomenclature
x ₁	Current breaking frequency
x ₂	Operating mechanism frequency
x ₃	Use time
x ₄	Annual average load level
x ₅	Environment pollution level
x ₆	Environment temperature

According to Table 7, the quantitative analysis model of “operating mechanism anomaly,”“auxiliary components damage,”“SF6 leakage,” and “main components deterioration” is established to conduct quantitative analysis.

1. Calculation of the single-fault occurrence probability

When “operating mechanism anomaly” occurs, the range of “current breaking frequency” is 36–55, establishing the regression equation with “operating mechanism anomaly” as dependent variable and independent variables are “current breaking frequency,”“operating mechanism frequency,”“use time,”“annual average load level,”“environment pollution level,” and “environment temperature” based on equation (5), and the following linear regression equation is obtained according to equations (6) and (7)

\begin{matrix} g (x) = - 12.0745 + 15.7764 x_{1} + 2.1547 x_{2} \\ + 0.1257 x_{3} - 0.3214 x_{4} + 0.1577 x_{5} + 0.2506 x_{6} \end{matrix}

In the statistical test, the confidence usually takes 0.1, judging the significance of attributes in terms of confidence, and eliminating x₂, x₃, x₄, x₅, and x₆, these five independent variables: “operating mechanism frequency,”“use time,”“annual average load level,”“environment pollution level,” and “environment temperature,” and reestimating weight coefficients after elimination.

Finally, the following regression equations are obtained

\begin{matrix} L : p (y = 1) = \frac{e^{- 10.158 + 15.037 x_{1}}}{1 + e^{- 10.158 + 15.037 x_{1}}} \\ L_{0} : p (y = 1) = \frac{e^{- 10.158}}{1 + e^{- 10.158}} \end{matrix}

According to equation (6), the logarithm likelihood value of L is ln L = –0.007575845, the logarithm likelihood value of L₀ is ln L₀ = –10.15803876, and then the test statistic is obtained based on equation (9)

\begin{matrix} LR = 2 (\ln L - \ln L_{0}) = 2 \\ (- 0.007575845 + 10.15803876) = 10.151046292 \end{matrix}

Because the value of LR is small, it is proved that L is significant, and the model fitting effect is good. When the range of “current breaking frequency” is 36–55, the quantitative effect on “operating mechanism anomaly” is shown in Figure 5. When the normalized value of “current breaking frequency” is below 0.763636, that is, the range of “current breaking frequency” is 36–43, and the occurrence probability of “operating mechanism anomaly” approaches 0, the fault will not occur; when the normalized value of “current breaking frequency” is above 0.763636 with corresponding discrete interval value range 43–55, and the occurrence probability approaches 1, fault is likely to occur in this interval, so it is necessary to pay more attention and prepare for fault repair in advance. For the analysis of “SF6 leakage,”“operating mechanism anomaly,”“auxiliary components damage,” and “main components deterioration” caused by “current breaking frequency” are the same as above. The final discrete interval value ranges are shown in Table 9.

Figure 5.

Curves of all faults occurrence probability under “current breaking frequency.”

Table 9.

Modified discrete intervals of different fault-influencing factors for different fault types.

Fault type	Use time	Annualaverageload level	Operating mechanismfrequency	Environmentpollutionlevel	Environmenttemperature	Current breaking frequency
SF6 leakage	Non-main	Above 80%	No effect	5	Non-main	[0, 22]
Operating mechanism anomaly	(5424, 7445]	Non-main	(341, 483]	Non-main	No effect	(43, 55]
Main componentsdeterioration	[2379, 2633][2961, 3216]	Non-main	(180, 251]	No effect	[−11, 3]	(43, 55]
Auxiliarycomponents damage	(6110, 7445]	No effect	(167, 324]	Non-main	Non-main	(38, 55]

The analysis process of the occurrence probability of “operating mechanism anomaly” is the same as the abovementioned detailed analysis, and the results are shown in Figure 6; the analysis shows that the occurrence probability will not exceed 0.5 no matter what ranges “annual average load level,”“environment pollution level,” and “environment temperature” are, it can be considered that these three factors are not the main factors affecting this fault. Also, when the range of “operating mechanism frequency” is 341–483, current breaking frequency” is 43–55, and “use time” is 5824–7445, the fault occurrence probability is close to 1, so these three fault-influencing factors are the main factors and need to be monitored during the operation of switchgears.

Figure 6.

The occurrence probability of “operating mechanism anomaly” for different fault-influencing factors.

2. Calculation of multiple-faults joint occurrence probability

Calculation of the multiple-faults joint occurrence probability is based on the modified discrete interval value ranges. As shown in Table 9, “current breaking frequency” can cause “operating mechanism anomaly,”“main components deterioration,” and “auxiliary components damage” kinds of faults, and the common range of “current breaking frequency” is 43–55 obtained by quantitative calculation. The joint occurrence probability is calculated according to equation (8), when the joint occurrence probability is very high, it is necessary to focus on the fault occurrence with the “current breaking frequency” in this value range, in addition, consider these three kinds of fault at the same time when developing maintenance strategy.

The calculation method of the multiple-faults joint occurrence probability caused by other fault-influencing factors are the same as the calculation mentioned above, and concrete results are shown in Table 10. When “current breaking frequency” is in the range of 43–55, the mean of the joint occurrence probability for “operating mechanism anomaly,”“auxiliary components damage,” and “main components deterioration” kinds of faults is 0.855686. When “use time” is in the range of 6110–7445, the mean of the joint occurrence probability for “operating mechanism anomaly” and “main components deterioration” is 0.898101. Hence, the strategy of simultaneous maintenance of multiple faults should be considered for the joint occurrence if probability is close to 1.

Table 10.

Multiple-faults joint occurrence probability (part).

No.	Current breaking frequency (43, 55]	Operating mechanism frequency (180, 251]	Use time (6110, 7445]
1	0.674244	0.180759	0.548961
2	0.669221	0.182389	0.627921
3	0.732469	0.133054	0.893213
4	0.738938	0.082821	0.666268
5	0.764248	0.059054	0.804996
…	…	…	…
Mean	0.855686	0.088478	0.898101

Comparative validation analysis

1. Comparative analysis of the method

The method proposed in this article combines qualitative and quantitative analysis, which is a typical association-based classification and regression problem. To verify the proposed method, the Xgboost algorithm is introduced for comparative analysis. The Xgboost algorithm³¹ is a well-designed GBDT (Gradient Boosting Decision Tree) algorithm that can efficiently process sparse data and implement distributed parallel computing flexibly. At present, research has been carried out through “lifting” idea of Xgboost to diagnose faults³² and has achieved good application results.

Based on the fault sample data set 1 in Table 3, the Xgboost algorithm is used to extract the fault-influencing factors. The results are shown in Table 11, the fault-influencing factors extracted by Xgboost algorithm are basically consistent with the method proposed in this article, where the fault-influencing factors of “operating mechanism anomaly” are different, which adds the “environment temperature.” The “environment temperature” was verdicted as “Non-main” by the method proposed in this article.

Table 11.

Fault-influencing factors extracted by Xgboost.

Fault type	Fault-influencing factors
SF6 leakage	Annual average load level, environment pollution level, current breaking frequency
Operating mechanism anomaly	Operating mechanism frequency, current breaking frequency, use time, environment temperature
Main components deterioration	Use time, operating mechanism frequency, environment temperature, current breaking frequency
Auxiliary components damage	Use time, operating mechanism frequency, current breaking frequency

Based on the above fault-influencing factors extraction results, the Xgboost algorithm is used to calculate the single failure probability. The quantitative effect on “operating mechanism anomaly” under “current breaking frequency” calculated by the two methods is shown in Figure 7, the two methods have the same trend. However, according to the specific probability comparison, the method of this article is more practical. For example, when the normalized value of “current breaking frequency” is 0.4, the occurrence probability of “SF6 leakage” obtained by the method is 1, while the probability of the Xgboost method approaches 0.95; when the normalized value of “current breaking frequency” is 0.2, the occurrence probability obtained by the method is 0.13, and the probability of the xgboost method is 0.21. Therefore, the proposed method can better reflect the probability of actual fault, and the model constructed in this article is more explanatory than the Xgboost model. The discrete interval value ranges calculated by the Xgboost method are shown in Table 12.

Figure 7.

The two methods effect comparison chart.

Table 12.

Discrete intervals of different fault-influencing factors for different fault types calculated by the Xgboost method.

Fault type	Use time	Annual averageload level	Operatingmechanism frequency	Environmentpollutionlevel	Environment temperature	Current breakingfrequency
SF6 leakage	Non-main	Above 80%	No effect	5	Non-main	[0, 21]
Operating mechanismanomaly	(5447, 7445]	Non-main	(353, 475]	Non-main	[6, 15]	(43, 55]
Maincomponents deterioration	[2414, 2631][2973, 3216]	Non-main	(191, 251]	No effect	[−11, 3]	(43, 55]
Auxiliarycomponents damage	(6151, 7445]	No effect	(167, 324]	Non-main	Non-main	(39, 55]

2. Multi-region data validation

The fault sample data of high-voltage switchgears from region 2 are shown in Table 13, based on the data set from region 2, and the comparison results of the final discrete interval value ranges based on two region fault samples are shown in Table 14.

Table 13.

Fault sample data set 2 of high-voltage switchgears operation process (part).

No.	Production company	Equipment type	Operatingmechanismtype	Annualaverageloadlevel	Environmentpollutionlevel	Environmenttemperature	OperatingmechanismfrequencyABC	Currentbreakingfrequency	Use time	Faulttype
1	ProCmp1	EquType1	MecType1	40%–60%	5	20	138	10	2758	SF6 leakage
2	ProCmp1	EquType2	MecType2	40%–60%	5	15	55	2	3015	Other faults
3	ProCmp3	EquType3	MecType2	40%–60%	5	25	81	8	4536	SF6 leakage
4	ProCmp3	EquType2	MecType1	40%–60%	5	30	224	43	5780	Operatingmechanism anomaly
5	ProCmp2	EquType3	MecType3	40%–60%	5	0	65	18	5731	SF6 leakage
…	…	…	…	…	…	…	…	…	…	…
162	ProCmp3	EquType2	MecType2	40%–60%	4	25	101	45	6000	Main components deterioration

Table 14.

Comparison of fault-influencing factors analysis results using different methods.

Fault type	Use time		Annual averageload level		Operating mechanismfrequency		Environment pollutionlevel		Environmenttemperature		Current breaking frequency
Fault type	The proposed method	Xgboost	The proposedmethod	Xgboost	The proposedmethod	Xgboost	The proposed method	Xgboost	The proposed method	Xgboost	The proposed method	Xgboost
SF6 leakage	Non-main	Non-main	Above 80%	Above 80%	No effect	No effect	5	5	Non-main	Non-main	[0, 22]	[0, 21]
Operatingmechanismanomaly	(5424, 7445]	(5447, 7445]	Non-main	Non-main	(341, 483]	(353, 475]	Non-main	Non-main	No effect	[6, 15]	(43, 55]	(43, 55]
Main componentsdeterioration	[2379, 2633][2961, 3216]	[2414, 2631][2973, 3216]	Non-main	Non-main	(180, 251]	(191, 251]	No effect	No effect	[−11, 3]	[−11, 3]	(43, 55]	(43, 55]
Auxiliary componentsdamage	(6110, 7445]	(6151, 7445]	No effect	No effect	(167, 324]	(167, 324]	Non-main	Non-main	Non-main	Non-main	(38, 55]	(39, 55]

It can be seen from the comparison in Table 14 based on two fault sample data set of two different region that the fault-influencing factors for different fault types are the same, and the discrete intervals of different fault-influencing factors for different fault types are basically the same. Therefore, the effectiveness and generalization of the proposed method is proved.

Conclusion

In this article, we proposed a new method for fault-influencing factors analysis of high-voltage switchgears using the machine learning algorithm based on the traditional analysis. The model of this proposed method integrated qualitative and quantitative analyses. For qualitative analysis, CMAR based on GMM was used to mine the association rules between fault-influencing factors and fault, and specific discrete interval value ranges were obtained with further analysis. Logistic regression was adapted as the quantitative analysis method to calculate fault occurrence probability in the corresponding discrete interval value ranges, eliminate the intervals in which the fault occurrence probability was close to 0 and keep the intervals in which the occurrence probability was close to 1, and then obtain the accurate discrete interval value ranges; in addition, the multiple-faults joint occurrence probability could be obtained. Taking the above accurate discrete interval value ranges and the mean of the joint occurrence probability into consideration, the fault-influencing factors analysis method can provide reference for the switchgears overhaul. It would be possible to plan for the required maintenance on high-voltage switchgears sufficiently and improve operation efficiency for the enterprise to a certain extent.

Footnotes

Acknowledgements

The authors would like to appreciate the editor, associate editor, and the reviewers for their valuable comments and suggestions.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by National Natural Science Foundation of China under Grant no. 51505357, Shaanxi Province International Cooperation Project under Grant No. BD18016040001 (2016KW-048), and the Free exploration fund (JB170404).

ORCID iD

Jiantao Chang

References

Zhang

Yang

SX.

An adaptive approach based on KPCA and SVM for real-time fault diagnosis of HVCBs. IEEE T Power Deliver 2011; 26(3): 1960–1971.

Zhao

Luo

, et al. Power system fault diagnosis based on history driven differential evolution and stochastic time domain simulation. Inform Sci 2014; 275(11): 13–29.

Wang

Zhang

Zhao

, et al. Fault diagnosis of electric power systems based on fuzzy reasoning spiking neural P systems. IEEE T Power Syst 2015; 30(3): 1182–1194.

Groenewald

JWD

Aldrich

. Root cause analysis of process fault conditions on an industrial concentrator circuit by use of causality maps and extreme learning machines. Mineral Eng 2015; 74: 30–40.

Tian

Zhao

Current kernel density estimation based transistor open-circuit fault diagnosis in two-level three phase rectifier. J Electron Lett 2016; 52(21): 1795–1797.

Leone

Cristaldi

Turrin

A data-driven prognostic approach based on statistical similarity: An application to industrial circuit breakers. J Meas 2017; 108:163–170.

Zhang

Liu

Wang

, et al. Mechanical fault diagnosis for HV circuit breakers based on ensemble empirical mode decomposition energy entropy and support vector machine. J Math Probl Eng 2015; 2015(2): 101757.

Huang

Chen

Cai

, et al. Mechanical fault diagnosis of high voltage circuit breakers based on variational mode decomposition and multi-layer classifier. J Sensor 2016; 16(11): 1887.

Zhu

Mei

Zheng

Adaptive fault diagnosis of HVCBs based on P-SVDD and P-KFCM. J Neurocomputing 2017; 240: 127–136.

10.

Ramentol

Gondres

Lajes

, et al. Fuzzy-rough imbalanced learning for the diagnosis of high voltage circuit breaker maintenance. J Eng Appl Artif Intell 2016; 48(C): 134–139.

11.

Lin

Yang

An intelligent maintenance model to assess the condition-based maintenance of circuit breakers. J Int Trans Electr Energ Syst 2015; 25(10): 2376–2393.

12.

Wang

Zhang

, et al. The machining error control of blade shape based on multivariate statistical process control. Proc IMechE, Part B: J Engineering Manufacture 2016; 229(11): 1912–1924.

13.

Razi-Kazemi

Vakilian

Niayesh

, et al. Data mining of online diagnosed waveforms for probabilistic condition assessment of SF6 circuit breakers. IEEE T Power Deliver 2015; 30(3): 1354–1362.

14.

Luo

Axinte

, et al. A wireless instrumented milling cutter system with embedded PVDF sensors. Mech Syst Signal Pr 2018; 110: 556–568.

15.

Rawat

Patel

Celestino

, et al. A dominance based rough set classification system for fault diagnosis in electrical smart grid environments. J Artif Intell Rev 2016; 46(3): 389–411.

16.

Ksal

Batmaz Testik

MC.

A review of data mining applications for quality improvement in manufacturing industry. J Expert Syst Appl 2011; 38(10): 13448–13467.

17.

Zhang

Ren

Liu

, et al. A big data analytics architecture for cleaner manufacturing and maintenance processes of complex products. J Clean Prod 2017; 142(2): 626–641.

18.

Chang

Kong

Yin

A novel approach for product makespan prediction in production life cycle. Int J Adv Manuf Tech 2015; 80(5–8): 1433–1448.

19.

Astolfi

Castellani

Garinei

, et al. Data mining techniques for performance analysis of onshore wind farms. J Applied Energy 2015; 148: 220–233.

20.

Simons

Cheung

WM.

Development of a quantitative analysis system for greener and economically sustainable wind farms. J Clean Prod 2016; 133: 886–898.

21.

Cai

Liu

Xie

A real-time fault diagnosis methodology of complex systems using object-oriented Bayesian networks. Mech Syst Signal Pr 2016; 80: 31–44.

22.

Cui

, et al. Health prognosis approach for manufacturing systems based on quality state task network. Proc IMechE, Part B: J Engineering Manufacture 2019; 233(5): 1573–1587.

23.

Han

, et al. Mission reliability modeling for multi-station manufacturing system based on Quality State Task Network. Proc IMechE, Part O: J Risk and Reliability 2017; 231(6): 701–715.

24.

Menon

Kärkkäinen

Wuest

, et al. Industrial internet platforms: a conceptual evaluation from a product lifecycle management perspective. Proc IMechE, Part B: J Engineering Manufacture 2019; 233(5): 1390–1401.

25.

Sun

Machine fault diagnosis based on Gaussian mixture model and its application. Int J Adv Manuf Tech 2010; 48(1–4): 205–212.

26.

, et al. Multi-label classification algorithm based on association rules. J Control Decis 2009; 24(4): 574–546.

27.

Han

Pei

. CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the IEEE international conference on data mining, San Jose, CA, 29 November–2 December 2001, pp.369–376. New York: IEEE.

28.

Nguyen

NT.

Updating mined class association rules for record insertion. Alphen aan den Rijn: Kluwer, 2015.

29.

Nguyen

LTT

Hong

, et al. CAR-Miner: an efficient algorithm for mining class-association rules. Expert Syst Appl 2013; 40(6): 2305–2311.

30.

Zand

Yazdanshenas

Amiri

Change point estimation in phase I monitoring of logistic regression profile. Int J Adv Manuf Tech 2013; 67(9): 2301–2311.

31.

Chen

Guestrin

XGBoost: A Scalable Tree Boosting System. In: Proceedings of the SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, 13–17 August 2016, pp.785–794. New York: ACM.

32.

Zhang

Chen

Wang

, et al. Application of Xgboost to fault diagnosis of rolling bearings. J Noise Vib Control 2017; 37(4): 166–170.

A novel approach of fault-influencing factors analysis for high-voltage switchgears quality supervision based on industrial big data

Abstract

Keywords

Introduction

Method of high-voltage switchgearsfault-influencing factors analysis based on industrial big data

Data resource processing and qualitative analysis based on GMM and CMAR

Fault-influencing factors quantitative analysis of the single-fault occurrence probability and the multiple-faults joint occurrence probability based on logistic regression

Application case of high-voltage switchgears fault-influencing factors analysis

Data processing and qualitative analysis based on GMM and CMAR

Fault-influencing factors quantitative analysis of multiple-faults joint occurrence probability

Comparative validation analysis

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iD

References