Fault Troubleshooting Using Bayesian Network and Multicriteria Decision Analysis

Abstract

Fault troubleshooting aims to diagnose and repair faults at the highest efficacy and a minimum cost. The efficacy depends on multiple criteria like fault probability, cost, time, and risk of a repair action. This paper proposes a novel fault troubleshooting approach by combining Bayesian network with multicriteria decision analysis (MCDA). Automobile engine start-up failure is used as a case study. Bayesian network is employed to establish fault diagnostic model for reasoning and calculating standard values of uncertain criteria like fault probability. MCDA is adopted to integrate the influence of the four criteria and calculate utility value of the actions in each troubleshooting step. The approach enables a cost-saving, high efficient, and low risky troubleshooting.

1. Introduction

Modern electromechanical product like a vehicle is a complex combination of mechanical and electrical systems. There exist complex, uncertain interrelationships between modules and components. The increased complexity brings huge difficulties for diagnosing and troubleshooting a fault. Bayesian network (BN) is a probability-based modeling technique and suitable for knowledge-based diagnostic systems. A BN enables us to model and reason about uncertainty, ideally suited for diagnosing real world problems where uncertain incomplete data exist. Therefore, it is a suitable solution for diagnosing complex electromechanical systems.

Bayesian network has been widely used in many applications for dependability, risk analysis, and maintenance areas [1] especially for fault diagnosis. Neil et al. [2] applied BN to predict the reliability of military vehicles and proposed a generic procedure on building large-scale Bayesian networks. BBN has also been applied in the monitoring of manufacturing processes [3, 4]. Huang et al. [5, 6] constructed a multilayer structure model using BN for fault diagnosis of automobile electronic system, which can be used for simultaneously diagnosing a fault case with multiple symptoms. In addition, Bayesian network is used in the fault diagnosis for rotor of the generator [7], solar plant [8], mobile telephone infrastructure [9], and production assembly line [10].

A comprehensive analysis indicates that these researches use Bayesian network only for reasoning and calculating probabilities of fault components and do not take into account other influence factors such as cost, time, and risk of a repair. In fact, apart from the probability of component's failure, these factors will significantly affect the repair decision while a diagnostic engineer is troubleshooting a fault. This is because an element with high failure likelihood may be difficult to repair, or the repair action costs a lot or contains a huge risk which may cause new faults or brings a safety issue. A sensible repair decision should be made by integrating the influence of multiple criteria such as failure likelihood, cost, time, and risk of the repair. This is actually a multicriteria decision-making problem.

Multicriteria decision analysis (MCDA) is a method that evaluates a series of feasible actions with a consideration of multiple criteria and makes a decision based on the evaluation results [11]. This paper proposes a novel approach for fault troubleshooting by combining bayesian network with multicriteria decision analysis (MCDA). The approach takes multiple criteria including failure likelihood, cost, time, and risk of the repair into account for making decision of repair actions. Bayesian network is used to establish fault diagnostic model for reasoning and calculating the probabilities of fault sources, while MCDA is employed to calculate quantitatively utility values of the various actions, which integrates the influence of the four criteria. The repair decision is made based on the utility value of each repair action. Although the paper uses an automotive engine start-up failure as a case study, the approach developed can be adopted for troubleshooting of generic electromechanical products. The approach ensures that the most sensible action is being executed during the troubleshooting process and therefore generates an effective and cost-saving repair.

Differing from other researches which only reason about fault probabilities using Bayesian network, this work extends a fault diagnosis into a fault troubleshooting (repair) with a consideration of multiple decision criteria including failure likelihood, cost, time, and risk of the repair. The contribution of the work is that the proposed approach combines MCDA with Bayesian network and provides a decision-theoretic strategy for troubleshooting of electromechanical products.

2. Fault Troubleshooting Based on Multicriteria Decision Analysis

The purpose of a fault troubleshooting process is to diagnose and repair faults at the highest efficacy and a minimum cost. The efficacy of each repair action is determined by the probability of failure, time, costs, and risk of the repair action. Multicriteria decision analysis is a process from proposing problem to forming the final action. The key concepts for a fault troubleshooting application are shown in Table 1.

Table 1:

Key concept for a fault troubleshooting application.

Decision question	Determinewhich faulty component should be repaired in each troubleshooting step
Decision objective	Diagnose and repair faults at the highest efficacy
Decision maker	Troubleshooter
Criteria	Probability of component failure Time Costs Risk of each repair action
The set of actions	Theset of pairs of the fault components or categories (set of actions in each step is deterministic)

2.1. Determination of Action Set and Criteria

In the paper, an automotive engine start-up failure is selected as an example to illustrate the approach. The troubleshooting process can be divided into two steps: the first step is to determine which fault category should be troubleshot; the second is to determine which component should be troubleshot. The action set for each step is deterministic. In the first step, an automotive engine start-up failure can be caused by four categories of failures including ignition system failure (Fire-Fault), fuel supply quantity insufficiency (Fuel-shortage), oil pressure shortage, and other factors; therefore, the set of actions in Step 1 contains repair Fire-Fault, repair Fuel-shortage, repair OilPressure-shortage, and repair other factors. If the action selected in the first step is repair Fire-Fault, the set of actions selectable in the second step is repair ignition timing error, repair ignition signal circuit, and repair ignition coil.

As discussed in Section 1, to make a troubleshooting decision, troubleshooters should not only consider the likelihood of component fault but also time, cost, and risk of repairing the component. Four evaluation criteria are determined:

fault probability,

time,

cost,

risk.

Fault probability is the likelihoods of component faulty causing automotive engine start-up failure. Time is defined as the time in minutes to complete the repair. Cost is the cost in pounds including component cost and labor charge. Risk is defined as the risk of causing other new faults and bringing safety issue to troubleshooter during the repair.

Among the four criteria, fault probability is uncertain and can be inferred from a BN diagnostic model. The other three are certain for a specific action and can be determined from expert knowledge.

2.2. Determining Weights of Criteria

MCDA evaluates the influence of multiple criteria by calculating a utility value where each criterion takes a certain weight. A few methods have been introduced in determining the criterion weights including ranking method and pairwise comparison method [12, 13]. Used in this work, the ranking method is to rank each criterion according to the importance believed by decision makers and determine the weights according to the order. The weights are calculated from the following equation:

w_{c} = \frac{n - r_{c} + 1}{\sum_{x = 1}^{n} (n - r_{x} + 1)},

(1)

where w_c is the weight of the criterion c, n is the number of decision criteria, and r_c is the position of criterion c in the order of importance.

Taking opinion of domain experts, importance order of each criterion is defined as follows:

fault-probability first

risk second

time third

cost fourth

The weight of each criterion calculated by ranking method is shown as follows:

\begin{matrix} w_{fault-probability} = 0.4, w_{risk} = 0.3, \\ w_{time} = 0.2, w_{cost} = 0.1 . \end{matrix}

(2)

3. Determining Fault Probability Using a BN Model

3.1. Theory of Bayesian Network

A Bayesian network is a graphic network model describing uncertainty about the causal relationships between variables and consists of a number of nodes, directed links, and probability tables [13]. Nodes represent variables that can be failure symptoms, fault class, and failure cause. Directed links indicate casual relationships between the variables. Conditional probabilities are used to present the strength of causal relationship between the nodes. Because directed links are not allowed to form cycles, BN is also called a directed acyclic graph. Theory to calculate probability in BN is inspired from Bayes’ theorem:

p (\frac{B_{i}}{A}) = \frac{p (B_{i} A)}{p (A)} = \frac{p (A / B_{i}) p (B_{i})}{\sum_{i = 1}^{n} p (A / B_{i}) p (B_{i})} i = 1,2, 3, \dots, n .

(3)

The target of building a BN in the paper is to refer the most likely failure cause, given a failure symptom, that is, to calculate posterior probabilities of failure causes given evidence. The calculus of posterior probability involves calculating the joint probability. To simplify the calculus of the joint probability, BN makes the following assumption of conditional independence; that is, any node only depends on its parent node and is conditionally independent of any unlinked nodes. That is, for any node X_i belongs to a collection of nodes {X₁, X₂,…X_i,…X_n}, if there is a parent node π(X_i)⊆{X₁, X₂,…, X_{i − 1}}, X_i will be conditionally independent of any other nodes except π(X_i). In terms of definition of conditional independence, there is

p (\frac{X_{i}}{X_{1}, X_{2}, \dots, X_{n}}) = p (\frac{X_{i}}{π (X_{i})}) .

(4)

Figure 1 shows an example of BN with a structure of three layers and five nodes x₁, x₂, x₃, x₄, x₅. According to (3), the posterior conditional probability can be obtained by

p (\frac{X_{2} = true}{X_{5} = true}) = \frac{p (X_{2} = true, X_{5} = true)}{p (X_{5} = true)},

(5)

where p(X₂ = true, X₅ = true) and p(X₅ = true) are called marginal probability and can be calculated from

\begin{array}{l} p (X_{2} = true, X_{5} = true) = \sum_{x_{1}, x_{3}, x_{4}} p (X_{2} = true, X_{1}, X_{3}, X_{4}, X_{5} = true), \\ p (X_{5} = true) = \sum_{x_{1}, x_{2}, x_{3}, x_{4}} p (X_{1}, X_{2}, X_{3}, X_{4}, X_{5} = true), \end{array}

(6)

where p(X₂ = true, X₁, X₃, X₄, X₅ = true) and p(X₁, X₂, X₃, X₄, X₅ = true) involve calculating the joint probability. According to the definition of joint probability and the chain rule, the joint probability p(X₁X₂X₃X₄X₅) can be calculated from

\begin{array}{l} p (X_{1}, X_{2}, X_{3}, X_{4}, X_{5}) = p (X_{1}) \prod_{i = 2}^{5} p (\frac{X_{i}}{X_{1} X_{2} \dots X_{i - 1}}) \\ = p (X_{1}) p (\frac{X_{2}}{X_{1}}) p (\frac{X_{3}}{X_{1} X_{2}}) \times p (\frac{X_{4}}{X_{1} X_{2} X_{3}}) p (\frac{X_{5}}{X_{1} X_{2} X_{3} X_{4}}) . \end{array}

(7)

Applying (4), (7) can be simplified to

p (X_{1} X_{2} X_{3} X_{4} X_{5}) = p (X_{1}) p (X_{2}) p (\frac{X_{3}}{X_{1} X_{2}}) \times p (\frac{X_{4}}{X_{3}}) p (\frac{X_{5}}{X_{3}}) .

(8)

Figure 1:

An example of BN with five nodes.

Substituting (8) into (6) makes the calculus of posterior probability much easier.

Establishment of the Bayesian network model for fault diagnosis contains three aspects of work: (1) determine the network topology structure; (2) determine the network parameters. These parameters indicate the probability dependency relationship between nodes; (3) probability propagation. This is to calculate the probability of each node given evidences.

3.2. Determination of Bayesian Network Topology

Engine starting failure (symptom) can be caused by four categories of faults including the ignition system failure (Fire_Fault), fuel supply quantity insufficiency (Fuel-shortage), oil pressure shortage (OilPressure_shortage), and other factors. Each category contains some specific root causes as shown in Table 2.

Table 2:

Fault causes for engine start-up failure.

Fault category	Fault causes

Fire_Fault	Ignition timing error (FireTime_fault) Ignition signal cutting off (FireSignal_break) Ignition coil failure (FireCore_fault)

Fuel_shortage	Injector failure (Injection_fault) Injector timing error (InjectionTime_fault)

OilPressure_shortage	Pipeline leak (Tubing_leak) Oil pump circuit breaker (PumpCirc_break) Fuel pressure regulator failure (FuelPreRegu_fault) Oilway jam (OilCirc_jams)

Others	Air filter serious congestion (Filter_fault) Serious exhaust pipe jam (ExhaustPipe_fault) Cylinder pressure lower (CylinderPre_fault) Distribution institution failure (Distribution_fault)

The topology of the Bayesian network model was built according to the fault causes of automobile engine start-up failure by using Hugin Expert software, as shown in Figure 2. The model contains 13 root nodes indicating the fault root causes drawn with orange ellipses, four intermediate nodes indicating fault category drawn with green ellipses, and a symptom node drawn with a yellow ellipse.

Figure 2:

BN for the automotive engine start-up failure.

3.3. Determination of Network Parameters

In the network, each node has two states for “fault” and “normal” and a probability table acquired from prior expert knowledge or historic data. Fault cause nodes that is, the root nodes are expressed by prior probabilities. Other nodes use conditional probability to indicate the probability dependence relationship. Table 3 shows probability table of FuelPreReg_fault of the root node. Table 4 shows conditional probability table of Fire_Fault of the fault category node.

Table 3:

Prior probability table of the root node.

FuelPreReg_fault
Fault	0.36
Normal	0.64

Table 4:

Conditional probability table of Fire_Fault.

FireCore_fault	Fault				Normal
FireSignal_break	Fault		Normal		Fault		Normal
FireTime_fault	Fault	Normal	Fault	Normal	Fault	Normal	Fault	Normal

Fault	0.98	0.92	0.91	0.89	0.7	0.6	0.68	0.1
Normal	0.02	0.08	0.09	0.11	0.3	0.4	0.32	0.9

3.4. Probability Propagation

After the model structure and the probability tables of all nodes are established, BN can propagate probabilities under a given evidence. When the engine cannot start properly (the state of Engine_Doesn't_Start is set to “fault”), the BN model performs inference and calculation of the probability for each node. Figure 3 shows the probability propagation results of the BN model. As shown in Figure 3, OilPressure_shortage in the fault categories has the highest fault probability, 0.8633, and the fault cause OilCirc_jams has the highest probability, 0.5929, within this category.

Figure 3:

Probability propagation of the BN model.

Similarly, when a new evidence like a test result is to be input into the model, the model will update the probabilities of all other nodes.

4. Making Troubleshooting Decision

Troubleshooting decision making is to determine a repair action to be conducted in each troubleshooting step. MCDA method is used to calculate utility values of the various actions, which integrates the influence of the four criteria. The action with highest utility value will be selected. One of the MCDA evaluation methods, crude multiattribute utility approach [14], is adapted in this work. Each criterion is assumed to be measurable on a ratio scale. The value of a criterion g_i can be normalized into a scale of [0 1], where 0 represents the “worst” influence for the criteria and 1 represents the “best.” For example, criterion risk is measured by risk probability [0 1]. An action with risk = 0.8 has a lower risk than a one with risk = 0.6. Criterion time in minutes can be normalized into [0 1]. An action with time = 0.8 takes a shorter time to complete than an action with time = 0.6.

In Table 5, the four criteria are given values that are mapped into [0 1] in the first step to make decision about which fault category to be repaired. For example, the “best” value of risk for four specified repair actions is 0.8 (repair Others), the “best” value of time is 0.7 (repair Fire_Fault). The values of the uncertain criteria are obtained from BN.

Table 5:

Criteria values g_i normalized to [0 1] scale in Step 1.

	Risk	Time	Cost	Fault probability

RepairFire_Fault	0.6	0.7	0.6	0.76
RepairFuel_shortage	0.7	0.5	0.7	0.47
RepairOilPressure_shortage	0.5	0.4	0.4	0.86
RepairOthers	0.8	0.6	0.5	0.5

MCDA evaluates the influence of multiple criteria by calculating a utility value where each criterion takes different weights. Each criterion is assigned a weight u_i that represents the importance of the criterion as explained in Section 2.2. The overall utility U(a) of an action a is calculated as the weighted sum,

U (a) = \sum u_{i} g_{i} (a) .

(9)

Table 6 shows the calculated utility values of four actions in the first decision-making step. As shown in Table 6, troubleshooters should go to Fire_Fault fault category in the first step since it has the highest utility value, 0.684.

Table 6:

Weighted utilities of actions in Step 1.

	Riskw = 0.3	Timew = 0.2	Costw = 0.1	Fault probabilityw = 0.4	Utilityvalue

RepairFire_Fault	0.18	0.14	0.06	0.304	0.684
RepairFuel_shortage	0.21	0.10	0.07	0.188	0.568
RepairOilPressure_shortage	0.15	0.08	0.04	0.344	0.614
RepairOthers	0.24	0.12	0.05	0.2	0.61

Since the decision was made to repair Fire_Fault in Step 1, the set of actions in the second troubleshooting step contains repair FireTime_fault, repair FireSignal_break, and repair FireCore_fault. Table 7 gives the four criteria values normalized to [0 1] for each action in the second step.

Table 7:

Criteria values normalized to [0 1] scale in Step 2.

	Risk	Time	Cost	Fault probability

RepairFireTime_fault	0.6	0.7	0.4	0.43
RepairFireSiganl_break	0.4	0.8	0.8	0.37
RepairFireCore_fault	0.5	0.6	0.7	0.57

Table 8 shows the calculated utility values of three actions in the second decision-making step. As shown in Table 8, the action of repair FireCore_fault has the highest utility value, 0.568, so the troubleshooting decision is to repair FirCore_fault.

Table 8:

Weighted utilities of actions in Step 2.

	Riskw = 0.3	Timew = 0.2	Costw = 0.1	Fault probabilityw = 0.4	Utility value

RepairFireTime_fault	0.18	0.14	0.04	0.172	0.532
RepairFireSiganl_break	0.12	0.16	0.08	0.148	0.508
RepairFireCore_fault	0.15	0.12	0.07	0.228	0.568

5. Conclusions

Fault troubleshooting aims to diagnose and repair faults at the highest efficacy and a minimum cost. The efficacy depends on multiple criteria including fault probability, cost, time, and risk of a repair action. The paper proposes a novel fault troubleshooting decision method based on Bayesian network and the MCDA. The approach ensures the most sensible repair action selected in each troubleshooting step, thereby enabling a cost-saving, high efficient, and low risky troubleshooting. The paper uses an automobile engine start-up failure as a case study, but the approach has universal significance for troubleshooting generic electromechanical products. In summary, this approach consists of the following steps: (1) identify the set of possible actions; (2) identify the set of criteria that is the attributes of actions, which will determine your choices; (3) determine uncertain criteria like fault likelihood and certain criteria like time, cost, and risk; (4) use Bayesian network to build a model to infer uncertainty criterion; (5) determine the values and weights for each criterion. Criterion values should be normalized to a [0 1] scale; (6) calculate the overall utility values of actions using the weighted sum method; (7) make decision based on the utility values obtained. A greater utility value has a higher priority.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Footnotes

Acknowledgment

This work was sponsored by National Natural Science Foundation of China (project No. 61374197), Science and Technology Commission of Shanghai Municipality (project No. 13510502600) and supported by the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning.

References

Weber

Medina-Oliva

Simon

, and Iung

, “Overview on Bayesian networks applications for dependability, risk analysis and maintenance areas,” Engineering Applications of Artificial Intelligence, vol. 25, no. 4, pp. 671–682, 2012.

Neil

Fenton

Forey

, and Harris

, “Using Bayesian belief networks to predict the reliability of military vehicles,” Computing and Control Engineering Journal, vol. 12, no. 1, pp. 11–20, 2001.

Kang

C. W.

and Golay

M. W.

, “A Bayesian belief network-based advisory system for operational availability focused diagnosis of complex nuclear power systems,” Expert Systems with Applications, vol. 17, no. 1, pp. 21–32, 1999.

Wolbrecht

D'Ambrosio

Paasch

, and Kirby

, “Monitoring and diagnosis of a multistage manufacturing process using Bayesian networks,” Artificial Intelligence for Engineering Design, Analysis and Manufacturing, vol. 14, no. 1, pp. 53–67, 2000.

Huang

McMurran

Dhadyalla

, and Jones

R. Peter

, “Probability based vehicle fault diagnosis: Bayesian network method,” Journal of Intelligent Manufacturing, vol. 19, no. 3, pp. 301–311, 2008.

Huang

Antory

, and Jones

R. P.

, “Bayesian belief network based fault diagnosis in automotive electronic systems,” in Proceedings of the 8th International Symposium on Advanced Vehicle Control, pp. 469–474, 2006.

B. G.

, “Intelligent fault inference for rotating flexible rotors using Bayesian belief network,” Expert Systems with Applications, vol. 39, no. 1, pp. 816–822, 2012.

Coleman

and Zalewski

, “Intelligent fault detection and diagnostics in solar plants,” in Proceedings of the 6th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS '2011), pp. 948–953, September 2011.

Chan

and McNaught

K. R.

, “Using Bayesian networks to improve fault diagnosis during manufacturing tests of mobile telephone infrastructure,” Journal of the Operational Research Society, vol. 59, no. 4, pp. 423–430, 2008.

10.

Liu

and Jin

, “Application of Bayesian networks for diagnostics in the assembly process by considering small measurement data sets,” International Journal of Advanced Manufacturing Technology, vol. 2, pp. 1–9, 2012.

11.

Eastman

J. R.

Jin

W. J. Weigen

Kyem

P. A. K.

, and Toledano

, “Raster procedures for multi-criteria/multi-objective decisions,” Photogrammetric Engineering & Remote Sensing, vol. 61, no. 5, pp. 539–547, 1995.

12.

Saaty

T. L.

, The Analytical Hierarchy Process, McGraw-Hill, New York, NY, USA, 1980.

13.

Yang

Jiang

, and Zhai

, “Exploring a method for ranking objects based on pairwise comparison,” in Proceedings of the 4th International Joint Conference on Computational Sciences and Optimization (CSO '11), pp. 382–386, April 2011.

14.

Vincke

, Multi-Criteria Decision Aid, Wiley, New York, NY, USA, 1992.