Abstract
Fault troubleshooting aims to diagnose and repair faults at the highest efficacy and a minimum cost. The efficacy depends on multiple criteria like fault probability, cost, time, and risk of a repair action. This paper proposes a novel fault troubleshooting approach by combining Bayesian network with multicriteria decision analysis (MCDA). Automobile engine start-up failure is used as a case study. Bayesian network is employed to establish fault diagnostic model for reasoning and calculating standard values of uncertain criteria like fault probability. MCDA is adopted to integrate the influence of the four criteria and calculate utility value of the actions in each troubleshooting step. The approach enables a cost-saving, high efficient, and low risky troubleshooting.
1. Introduction
Modern electromechanical product like a vehicle is a complex combination of mechanical and electrical systems. There exist complex, uncertain interrelationships between modules and components. The increased complexity brings huge difficulties for diagnosing and troubleshooting a fault. Bayesian network (BN) is a probability-based modeling technique and suitable for knowledge-based diagnostic systems. A BN enables us to model and reason about uncertainty, ideally suited for diagnosing real world problems where uncertain incomplete data exist. Therefore, it is a suitable solution for diagnosing complex electromechanical systems.
Bayesian network has been widely used in many applications for dependability, risk analysis, and maintenance areas [1] especially for fault diagnosis. Neil et al. [2] applied BN to predict the reliability of military vehicles and proposed a generic procedure on building large-scale Bayesian networks. BBN has also been applied in the monitoring of manufacturing processes [3, 4]. Huang et al. [5, 6] constructed a multilayer structure model using BN for fault diagnosis of automobile electronic system, which can be used for simultaneously diagnosing a fault case with multiple symptoms. In addition, Bayesian network is used in the fault diagnosis for rotor of the generator [7], solar plant [8], mobile telephone infrastructure [9], and production assembly line [10].
A comprehensive analysis indicates that these researches use Bayesian network only for reasoning and calculating probabilities of fault components and do not take into account other influence factors such as cost, time, and risk of a repair. In fact, apart from the probability of component's failure, these factors will significantly affect the repair decision while a diagnostic engineer is troubleshooting a fault. This is because an element with high failure likelihood may be difficult to repair, or the repair action costs a lot or contains a huge risk which may cause new faults or brings a safety issue. A sensible repair decision should be made by integrating the influence of multiple criteria such as failure likelihood, cost, time, and risk of the repair. This is actually a multicriteria decision-making problem.
Multicriteria decision analysis (MCDA) is a method that evaluates a series of feasible actions with a consideration of multiple criteria and makes a decision based on the evaluation results [11]. This paper proposes a novel approach for fault troubleshooting by combining bayesian network with multicriteria decision analysis (MCDA). The approach takes multiple criteria including failure likelihood, cost, time, and risk of the repair into account for making decision of repair actions. Bayesian network is used to establish fault diagnostic model for reasoning and calculating the probabilities of fault sources, while MCDA is employed to calculate quantitatively utility values of the various actions, which integrates the influence of the four criteria. The repair decision is made based on the utility value of each repair action. Although the paper uses an automotive engine start-up failure as a case study, the approach developed can be adopted for troubleshooting of generic electromechanical products. The approach ensures that the most sensible action is being executed during the troubleshooting process and therefore generates an effective and cost-saving repair.
Differing from other researches which only reason about fault probabilities using Bayesian network, this work extends a fault diagnosis into a fault troubleshooting (repair) with a consideration of multiple decision criteria including failure likelihood, cost, time, and risk of the repair. The contribution of the work is that the proposed approach combines MCDA with Bayesian network and provides a decision-theoretic strategy for troubleshooting of electromechanical products.
2. Fault Troubleshooting Based on Multicriteria Decision Analysis
The purpose of a fault troubleshooting process is to diagnose and repair faults at the highest efficacy and a minimum cost. The efficacy of each repair action is determined by the probability of failure, time, costs, and risk of the repair action. Multicriteria decision analysis is a process from proposing problem to forming the final action. The key concepts for a fault troubleshooting application are shown in Table 1.
Key concept for a fault troubleshooting application.
2.1. Determination of Action Set and Criteria
In the paper, an automotive engine start-up failure is selected as an example to illustrate the approach. The troubleshooting process can be divided into two steps: the first step is to determine which fault category should be troubleshot; the second is to determine which component should be troubleshot. The action set for each step is deterministic. In the first step, an automotive engine start-up failure can be caused by four categories of failures including ignition system failure (Fire-Fault), fuel supply quantity insufficiency (Fuel-shortage), oil pressure shortage, and other factors; therefore, the set of actions in Step 1 contains repair Fire-Fault, repair Fuel-shortage, repair OilPressure-shortage, and repair other factors. If the action selected in the first step is repair Fire-Fault, the set of actions selectable in the second step is repair ignition timing error, repair ignition signal circuit, and repair ignition coil.
As discussed in Section 1, to make a troubleshooting decision, troubleshooters should not only consider the likelihood of component fault but also time, cost, and risk of repairing the component. Four evaluation criteria are determined:
fault probability,
time,
cost,
risk.
Fault probability is the likelihoods of component faulty causing automotive engine start-up failure. Time is defined as the time in minutes to complete the repair. Cost is the cost in pounds including component cost and labor charge. Risk is defined as the risk of causing other new faults and bringing safety issue to troubleshooter during the repair.
Among the four criteria, fault probability is uncertain and can be inferred from a BN diagnostic model. The other three are certain for a specific action and can be determined from expert knowledge.
2.2. Determining Weights of Criteria
MCDA evaluates the influence of multiple criteria by calculating a utility value where each criterion takes a certain weight. A few methods have been introduced in determining the criterion weights including ranking method and pairwise comparison method [12, 13]. Used in this work, the ranking method is to rank each criterion according to the importance believed by decision makers and determine the weights according to the order. The weights are calculated from the following equation:
where w c is the weight of the criterion c, n is the number of decision criteria, and r c is the position of criterion c in the order of importance.
Taking opinion of domain experts, importance order of each criterion is defined as follows:
fault-probability first
risk second
time third
cost fourth
The weight of each criterion calculated by ranking method is shown as follows:
3. Determining Fault Probability Using a BN Model
3.1. Theory of Bayesian Network
A Bayesian network is a graphic network model describing uncertainty about the causal relationships between variables and consists of a number of nodes, directed links, and probability tables [13]. Nodes represent variables that can be failure symptoms, fault class, and failure cause. Directed links indicate casual relationships between the variables. Conditional probabilities are used to present the strength of causal relationship between the nodes. Because directed links are not allowed to form cycles, BN is also called a directed acyclic graph. Theory to calculate probability in BN is inspired from Bayes’ theorem:
The target of building a BN in the paper is to refer the most likely failure cause, given a failure symptom, that is, to calculate posterior probabilities of failure causes given evidence. The calculus of posterior probability involves calculating the joint probability. To simplify the calculus of the joint probability, BN makes the following assumption of conditional independence; that is, any node only depends on its parent node and is conditionally independent of any unlinked nodes. That is, for any node X i belongs to a collection of nodes {X1, X2,…X i ,…X n }, if there is a parent node π(X i )⊆{X1, X2,…, Xi − 1}, X i will be conditionally independent of any other nodes except π(X i ). In terms of definition of conditional independence, there is
Figure 1 shows an example of BN with a structure of three layers and five nodes x1, x2, x3, x4, x5. According to (3), the posterior conditional probability can be obtained by
where p(X2 = true, X5 = true) and p(X5 = true) are called marginal probability and can be calculated from
where p(X2 = true, X1, X3, X4, X5 = true) and p(X1, X2, X3, X4, X5 = true) involve calculating the joint probability. According to the definition of joint probability and the chain rule, the joint probability p(X1X2X3X4X5) can be calculated from
Applying (4), (7) can be simplified to

An example of BN with five nodes.
Substituting (8) into (6) makes the calculus of posterior probability much easier.
Establishment of the Bayesian network model for fault diagnosis contains three aspects of work: (1) determine the network topology structure; (2) determine the network parameters. These parameters indicate the probability dependency relationship between nodes; (3) probability propagation. This is to calculate the probability of each node given evidences.
3.2. Determination of Bayesian Network Topology
Engine starting failure (symptom) can be caused by four categories of faults including the ignition system failure (Fire_Fault), fuel supply quantity insufficiency (Fuel-shortage), oil pressure shortage (OilPressure_shortage), and other factors. Each category contains some specific root causes as shown in Table 2.
Fault causes for engine start-up failure.
The topology of the Bayesian network model was built according to the fault causes of automobile engine start-up failure by using Hugin Expert software, as shown in Figure 2. The model contains 13 root nodes indicating the fault root causes drawn with orange ellipses, four intermediate nodes indicating fault category drawn with green ellipses, and a symptom node drawn with a yellow ellipse.

BN for the automotive engine start-up failure.
3.3. Determination of Network Parameters
In the network, each node has two states for “fault” and “normal” and a probability table acquired from prior expert knowledge or historic data. Fault cause nodes that is, the root nodes are expressed by prior probabilities. Other nodes use conditional probability to indicate the probability dependence relationship. Table 3 shows probability table of FuelPreReg_fault of the root node. Table 4 shows conditional probability table of Fire_Fault of the fault category node.
Prior probability table of the root node.
Conditional probability table of Fire_Fault.
3.4. Probability Propagation
After the model structure and the probability tables of all nodes are established, BN can propagate probabilities under a given evidence. When the engine cannot start properly (the state of Engine_Doesn't_Start is set to “fault”), the BN model performs inference and calculation of the probability for each node. Figure 3 shows the probability propagation results of the BN model. As shown in Figure 3, OilPressure_shortage in the fault categories has the highest fault probability, 0.8633, and the fault cause OilCirc_jams has the highest probability, 0.5929, within this category.

Probability propagation of the BN model.
Similarly, when a new evidence like a test result is to be input into the model, the model will update the probabilities of all other nodes.
4. Making Troubleshooting Decision
Troubleshooting decision making is to determine a repair action to be conducted in each troubleshooting step. MCDA method is used to calculate utility values of the various actions, which integrates the influence of the four criteria. The action with highest utility value will be selected. One of the MCDA evaluation methods, crude multiattribute utility approach [14], is adapted in this work. Each criterion is assumed to be measurable on a ratio scale. The value of a criterion g i can be normalized into a scale of [0 1], where 0 represents the “worst” influence for the criteria and 1 represents the “best.” For example, criterion risk is measured by risk probability [0 1]. An action with risk = 0.8 has a lower risk than a one with risk = 0.6. Criterion time in minutes can be normalized into [0 1]. An action with time = 0.8 takes a shorter time to complete than an action with time = 0.6.
In Table 5, the four criteria are given values that are mapped into [0 1] in the first step to make decision about which fault category to be repaired. For example, the “best” value of risk for four specified repair actions is 0.8 (repair Others), the “best” value of time is 0.7 (repair Fire_Fault). The values of the uncertain criteria are obtained from BN.
Criteria values g i normalized to [0 1] scale in Step 1.
MCDA evaluates the influence of multiple criteria by calculating a utility value where each criterion takes different weights. Each criterion is assigned a weight u i that represents the importance of the criterion as explained in Section 2.2. The overall utility U(a) of an action a is calculated as the weighted sum,
Table 6 shows the calculated utility values of four actions in the first decision-making step. As shown in Table 6, troubleshooters should go to Fire_Fault fault category in the first step since it has the highest utility value, 0.684.
Weighted utilities of actions in Step 1.
Since the decision was made to repair Fire_Fault in Step 1, the set of actions in the second troubleshooting step contains repair FireTime_fault, repair FireSignal_break, and repair FireCore_fault. Table 7 gives the four criteria values normalized to [0 1] for each action in the second step.
Criteria values normalized to [0 1] scale in Step 2.
Table 8 shows the calculated utility values of three actions in the second decision-making step. As shown in Table 8, the action of repair FireCore_fault has the highest utility value, 0.568, so the troubleshooting decision is to repair FirCore_fault.
Weighted utilities of actions in Step 2.
5. Conclusions
Fault troubleshooting aims to diagnose and repair faults at the highest efficacy and a minimum cost. The efficacy depends on multiple criteria including fault probability, cost, time, and risk of a repair action. The paper proposes a novel fault troubleshooting decision method based on Bayesian network and the MCDA. The approach ensures the most sensible repair action selected in each troubleshooting step, thereby enabling a cost-saving, high efficient, and low risky troubleshooting. The paper uses an automobile engine start-up failure as a case study, but the approach has universal significance for troubleshooting generic electromechanical products. In summary, this approach consists of the following steps: (1) identify the set of possible actions; (2) identify the set of criteria that is the attributes of actions, which will determine your choices; (3) determine uncertain criteria like fault likelihood and certain criteria like time, cost, and risk; (4) use Bayesian network to build a model to infer uncertainty criterion; (5) determine the values and weights for each criterion. Criterion values should be normalized to a [0 1] scale; (6) calculate the overall utility values of actions using the weighted sum method; (7) make decision based on the utility values obtained. A greater utility value has a higher priority.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Footnotes
Acknowledgment
This work was sponsored by National Natural Science Foundation of China (project No. 61374197), Science and Technology Commission of Shanghai Municipality (project No. 13510502600) and supported by the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning.
