Abstract
The integrated navigation system highly relies on the accuracy of measurements of sensors that are susceptible to unknown disturbances. In order to improve the reliability and safety of the navigation system, there is an increasing need for the fault detection of the sensors. In the present study, a hybrid data-driven fault detection strategy is proposed, which is based on residual sequence analysis. Currently, the one-class support vector machine is one of the most popular fault detection methods for navigation systems with many successful cases. Therefore, the one-class support vector machine is combined with time-series similarity measure and modified principal components analysis approaches. The similarity measurement of multi-sequence residuals between a real-time sample and normal condition samples is computed to construct learning features for one-class support vector machine. Similarly, the modified principal components analysis scheme is applied to project residuals onto subspaces and obtain learning features. Moreover, the one-class support vector machine model is applied for abnormal detection if unexpected sensor faults exhibit in measurements and residuals. Finally, experiments are carried out to evaluate the performance of the proposed strategy for abrupt faults and soft faults on navigation sensors. Experimental results show that the hybrid data-driven fault detection strategy can effectively detect these faults with short time delay and high accuracy.
Keywords
Introduction
With the considerably increasing demands for reliability and stability in the complex multi-sensor systems, fault detection (FD) has become an essential research field to ensure the precise and accurate performance of sensors. In addition to the FD research hotspot in industrial processes, the data-driven FD methodology which originates from insufficient knowledge about complex systems and unknown fault types 1–3 has attracted many scholars. Recently, the model-based FD method has been applied in diverse applications, and remarkable results have been applied for process monitoring and system fault diagnosis.4,5 However, the model-based FD method needs physical and mathematical prior knowledge of the system. On the contrary, when the structure of the engineering system is complicated and the operational requirements vary in different conditions, the model-based approaches confront significant limitations for obtaining satisfactory FD results. 6 In order to resolve this problem, a data-driven strategy has been proposed to collect the data of various operating states. The normal input/output variables and their correlations in this condition can be obtained based on data training of a “particular” operation state. Then, the training model can be used for process monitoring and abnormal detection. Reviewing the literature indicates that the data-driven FD schemes have attracted many scholars in recent years for investigating dynamic systems.
The integrated navigation system that contributes a significant role in various carriers, including aircraft, vehicle, ship, and modern weapon system, should provide accurate geographical position, velocity, and attitude information. 7 In order to maintain the navigation sensors (i.e. gyroscopes, accelerometers, and global positioning system (GPS) receiver) at stable performance and monitor the corresponding estimation states, many FD strategies have been proposed so far by the researchers.8–18 Conventional model-based approaches and data-driven approaches were proposed accordingly for intelligent FD and fault diagnosis. 8 Moreover, considering extensive applications of unmanned operation systems and complicated integrated navigation systems, finding a reliable real-time FD method is of great importance. The integrated navigation system faults can be mainly divided into two categories: (1) abrupt faults caused by hardware failures or strong impulse disturbance, and (2) soft faults that widely exist in inertial sensors, which may come from severe drift errors. Abrupt faults cause a serious deviation of the navigation system in a short time. However, these faults can be detected by simple analysis. On the other hand, soft faults that affect other subsystems or even the whole system by slow changes can be hardly detected and isolated. This is especially more pronounced when soft faults originate from minor errors where there is insufficient knowledge about these faults. In order to resolve this shortcoming, investigating the FD of integrated navigation systems has received significant attention in the past decades.9–11
The existing FD methods for the navigation systems can be generally divided into two groups, including analytical model-based methods and data-driven methods.
9
Analytical model-based methods depend on the constructed physical model. However, analytical methods require lots of prior knowledge such as accurate parameters of the dynamic model.12,13 For instance, the state chi-square test (SCST) is the most classic method for an integrated navigation system that detects the fault through a constructed statistic between measurements and predictions of the recursion filter. Monteriu et al.
13
presented a model-based multiple sensors fault detection and isolation (FDI) by using the “structural analysis” that includes the residual generation and ad hoc residual evaluation for unmanned ground vehicles (UGVs). Moreover,
By contrast, the data-driven method solves FD problems by multivariate statistical methods and training machine learning models from the historical input/output dataset. From this point of view, the residual chi-square test, which detects the various faults by a mathematical statistics method, can be classified into a data-driven method. Although the residual chi-square test has more reasonable dynamic and real-time performance, it lacks sensitivity to soft faults and heavily relies on system parameters. On the contrary, the intelligent methods depending on the nonanalytic model, such as the artificial neural network (ANN), support vector machine (SVM), Markov models (MMs), or other models, provide powerful approaches to implement data-driven FD. Reviewing the literature shows that studies on data-driven FD methods for integrated navigation systems have received much attention recently. Guo et al. 16 proposed an active FD method based on one-class support vector machine (OC-SVM) and deep neural network (DNN). They effectively applied the OC-SVM to detect faults of navigation sensors, and the DNN predicts the running data to replace fault time data. Moreover, Zhao et al. 17 established an FD model by using the belief rule base (BRB). Then, an expectation–maximization (EM) algorithm is adopted to investigate parameter recursive estimation and online update. The model can track the fault state and investigate the FD in real time. Xu and Lian 18 proposed a multi-channel single-dimensional fully convolutional neural network (MS-FCN) FD method. This method extracts the features from measuring residual sequences of the sensors and discriminates the operating state with the prior information. These methods utilize sensor sampling data or residual data as a mathematical model training the dataset directly. However, it is a challenging task for an integrated navigation system due to its complicated multi-sensors, limited training dataset, and lack of prior knowledge of fault states. It is worth noting that as the accuracy of the recursive filter algorithm is improved, it is expected to make full use of residual data to implement FD in a data-driven way.19,20 Therefore, the hybrid method based on model and data-driven has been proposed in the literature. For example, Liu et al. 21 combined the SCST and simplified the fuzzy Adaptive Resonance Theory Map (ARTMAP) neural (SFAM) network to overcome the problem of the FD in a noisy environment.
According to these studies and their successful applications in the navigation system, it is intended to propose a new hybrid data-driven FD strategy. The proposed strategy combines the OC-SVM model with time-series similarity measurement (SIM) and modified principal components analysis (MPCA) approaches. The residual sequence from the Kalman filter (KF) is preprocessed using SIM and MPCA approach to investigate accumulated fault errors. Then, the dataset of the normal condition is collected and trained in the SVM model for detecting sensor faults. The proposed data-driven FD method for the integrated navigation system is formulated as an abnormal detection problem when prior knowledge of fault types is difficult to obtain. It is expected that the proposed strategy can provide a real-time FD method based on status monitoring for both abrupt and soft faults.
The present study is organized as follows: The “Framework of hybrid data-driven FD” section presents the hybrid data-driven FD framework for the integrated system. Then, SIM and MPCA methods are introduced for analyzing the multi-sequence residuals. In the “Residual characterization” section, the FD is formulated as an abnormal detection problem using OC-SVM. Experimental validation of the proposed strategy is presented in the “FD based on OC-SVM” section, and then follows the “Simulation experiments and results” section. Finally, concluding remarks are given iresulting inn the “Conclusion” section.
Framework of hybrid data-driven FD
Figure 1 illustrates the main framework of the hybrid data-driven FD strategy for the integrated system. It is observed that the FD sub-filter consists of the measurement, preprocessing, and FD units. The measurement unit provides the sampling data of sensors (inertial measurement unit and GPS) in real time and residual data of KF. When the system is operating at the normal condition, the residual sequences of KF are stored as multivariate time-series dataset beforehand and then these datasets are processed as the training dataset

Main framework of data-driven fault detection strategy.
It should be indicated that the modular of the KF, SIM and MPCA processor, and OC-SVM model can be considered as sub-filters in more integrated navigation systems. 16 Meanwhile, the sub-filter design can be applied in a federal filter structure. The FD module serves as a filter and controls a switch to determine whether the connected sensor is in a good condition or not. The main filter of the navigation system can adjust the filtering mode to generate reliable navigation data. Therefore, the FD module plays a significant role in the integrated navigation system. Furthermore, the hybrid data-driven FD method provides a basis for the fault diagnosis and fault-tolerant techniques to meet the reliability requirements of navigation sensors.
Residual characterization
Residual characterization based on SIM
In the residual chi-square test method, the soft faults coincide with minor errors. Moreover, the forecasting value
Assume that the dynamic model of a discrete integrated navigation system with a fault can be formulated in the form below
where
where
The recursive state vector
In the residual chi-square test method, a statistic is constructed using predicted measurements
where
Equation (5) indicates that the dataset
When the system performs the actual navigation task, a real-time residual at each discrete epoch k is generated based on the local filter and the obtained result is stored as multi-sequence
Then, a elements of the residual should be selected from normal multi-sequence dataset
Subsequently, the DTW distance between the same variables of
Based on the measured distance through the DTW method, accumulative error during fault occurring time can be integrated into similarity measurement
A modified PCA for residual characterization
Studies show that the PCA method is a basic and efficient statistical method that can effectively extract and preserve a significant amount of information for the data variability and proposes originally of the dimension reduction. On the contrary, the PCA method has a simple structure, which is more appropriate for handling a large number of stationary process data with a Gaussian distributed variable. Furthermore, the PCA scheme has been widely and successfully employed as a multivariate statistical tool in many status monitoring and fault diagnosis applications.25–27 Based on a hybrid linear–nonlinear statistical modeling, Deng et al. 28 proposed a serial PCA (SPCA) for nonlinear process monitoring. Furthermore, Peng et al. 29 reported a kernel independent and principal components analysis (kernel ICA-PCA) for the hot strip mill process. As an effective data-driven FD and diagnosis tool based on multivariate statistical process monitoring, PCA and its extension have been investigated by many researchers. 1 In this section, a modified PCA is proposed to obtain residual characterization vector to efficiently characterize residuals as learning features for the SVM method.
Similar to the SIM method, a recorded residual dataset D at the normal condition is collected with zero mean and normalized with the unit variance for training purposes. In the proposed hybrid FD framework, the multivariate dataset L can be shared by SIM and PCA methods as follows
The covariance matrix is defined as
Then, singular value decomposition (SVD) is performed on the covariance matrix
where
where
The matrix
Similarly, the modified PCA method can be effectively applied to characterize accumulative errors in residual sequences in the vector
Theoretical analysis for residual characterization
As mentioned in the foregoing sections, the residual characterization based on the SIM and MPCA methods can be applied to construct the learning features for the SVM method, which drives the OC-SVM model to implement abnormal detection. The common advantage of these two methods is the ability to characterize error
Assume that real-time multi-residual sequences at fault-free condition
When the multi-sequences
Comparing the sequences
Similarly, the multi-residual sequences
Since fault errors affect the results of features learned by the PCA model, the elements of characterized vectors
FD based on OC-SVM
FD based on the OC-SVM method is an anomaly detection approach. It is one of the most popular data-driven FD methods with wide applications in diverse areas.30–33 Studies show that this technique is especially effective for the situation where normal operation samples are easily accessible, while the fault samples are expensive to be understood. Therefore, since the prior knowledge of unknown faults is rare, the OC-SVM method is a powerful scheme to FD of the multi-sensor navigation system.
OC-SVM
The OC-SVM method is a kernel based on a support vector description with a training dataset (target class) consisting of positive examples only. It computes the smallest sphere in the feature and finds a unique optimal hyperplane that separates the training dataset from the origin with maximum margin. In other words, the origin is treated as an outlier from the target class. In the proposed hybrid data-driven FD strategy, the characterization vectors of normal condition dataset
Then, the optimal hyperplane is described as the following
Where
where
where
The kernel function induces the OC-SVM working in the feature space and we focus on RBF kernel in our strategy. After obtaining the optimal solution
where
Abnormal detection algorithm
In practical applications, a statistic detection amount should be determined for the FD problem. More specifically,
where

Abnormal detection scheme based on OC-SVM.
Simulation experiments and results
Experiments setting and FD results
In this section, an inertial navigation systems/global navigation satellite system(INS/GNSS)-integrated navigation system of the unmanned aerial vehicle (UAV) is designed in the MATLAB environment to evaluate the validity of the proposed hybrid data-driven FD strategy. The abrupt faults and soft faults are both simulated to occur on the integrated navigation system. The training dataset is initially generated by simulating the normal operation of the system. Then, several faults are set into navigation sensors successively at different times. The multi-sequence residuals of fault condition are selected as the testing dataset. Table 1 shows the specifications of the UAV integrated navigation system. Moreover, Table 2 presents details of specific faults.
Specifications of the simulated UAV integrated navigation system.
UAV: unmanned aerial vehicle; GNSS: global navigation satellite system; GPS: global positioning system.
Details of specific faults.
GNSS: global navigation satellite system.
In order to obtain more ideal Gaussian distribution residual data, the data in the stable state of trajectory at normal conditions are collected as prior data. It should be indicated that each simulation is conducted twice with the same trajectory to obtain the training dataset in reasonable condition. The duration of each simulation is 20 min and
Based on the continuous residual samples, the feature vectors for training the OC-SVM model should be computed through the proposed SIM and modified PCA methods. In order to compare the obtained results from the two methods, the same parameter setting should be set applied in both SIM and MPCA methods. More specifically, length a of multi-sequence residuals
After generating feature vectors, the OC-SVM model is trained using an RBF with the RBF kernel
The faults discussed in Table. 2 are injected into the navigation sensors, and 600 groups of real-time samples for each fault are collected during the failure period. Figures 3 and 4 show the FD results of SIM + OC-SVM and MPCA + OC-SVM methods, respectively, where x- and y-axes represent the detection time and distance metric by the OC-SVM method, respectively. The obtained results reveal that both SIM + OC-SVM and MPCA + OC-SVM methods can successfully detect the faults with short delay time (DT). Moreover, for an abrupt fault, both SIM + OC-SVM and MPCA + OC-SVM methods can directly detect the fault without delay. However, for soft faults, a short DT of 4 to 11 s exists in the detection process, which is mainly caused by the insufficiency of error accumulation. Moreover, it is worth noting that there are some detection points that are below the detection threshold during the failure period. These points are presented in Figures 3(c), 4(b), and 4(d). However, this does not affect the detection effectiveness FD of the proposed method. Therefore, data of navigation sensors should be verified prior to the use. To this end, experiments are carried out to verify the validity of SIM + OC-SVM and MPCA + OC-SVM methods.

Fault results based on SIM + OC-SVM: (a) abrupt fault detection based on SIM + OC-SVM, (b) soft fault (GNSS) detection based on SIM + OC-SVM, (c) soft fault (Gyro) detection based on SIM + OC-SVM, and (d) soft fault (accelerometer) detection based on SIM + OC-SVM.

Fault results based on SIM + OC-SVM: (a) abrupt fault detection based on MPCA + OC-SVM, (b) soft fault (GNSS) detection based on MPCA + OC-SVM, (c) soft fault (Gyro) detection based on MPCA + OC-SVM, and (d) soft fault (accelerometer) detection based on MPCA + OC-SVM.
Comparison study with HS
Based on the foregoing section, it is found that SIM + OC-SVM and MPCA + OC-SVM methods can be effectively applied to obtain effective FD with a short time delay. In this section, it is intended to apply several OC-SVM FD methods for the navigation system to evaluate the FD efficiency of the HS. To this end, the OC-SVM method based on the phase space reconstruction (PSR + OC-SVM) 16 and the OC-SVM method based on multiple kernel anomaly detection (MKAD + OC-SVM) 37 are applied in the navigation experiment. Generally used FD indices, including fault detection rate (FDR) and false alarm rate (FAR), are initially introduced to act as the performance evaluation standards 38
where
The proposed methods, including SIM + OC-SVM and MPCA + OC-SVM, utilize the multi-sequence residuals to construct the learning features for OC-SVM. It should be indicated that the abnormal detection methods using continuous data have been the mainstream in the past decade. Similarly, MKAD + OC-SVM is also a data-driven method by using multivariate continuous data to detect anomalies, which is derived from multiple kernel learning. The resultant kernels can be constructed over discrete sequences and discretized continuous time series for OC-SVM constructing an optimal hyperplane. It should be indicated that the process of constructing kernels is to measure the similarity between the discrete sequences, in other words, to find the representation of time series, which is inversely proportional to the distance. This is similar to the proposed SIM + OC-SVM method. In the proposed method, it is intended to verify whether the MKAD + OC-SVM method can find the similarity in the multivariate residual sequences. However, the PSR + OC-SVM utilizes a single sample for detection rather than multi-sequence data. The reason for this comparison is that it is intended to test whether the faults can be detected by constructing features at one point. In other words, several dimensional features are constructed from time-series navigation signals for OC-SVM training in the PSR + OC-SVM method. However, in the detection stage, a sample point x will be mapped into the feature space.
In the simulation, 50 groups of the real-time multi-sequence residual datasets are included for comparison. The injected faults are selected from Table 2, and the corresponding parameter is set as according to the “OC-SVM” section. In other words, the length of the multi-sequence is
FDR and FAR (%) results of different faults (Table 2) utilizing all the methods.
The comparison study demonstrates that the proposed SIM + OC-SVM and MPCA + OC-SVM methods offer high FDRs and low FARs in contrast with PSR + OC-SVM and MKAD + OC-SVM, especially in soft faults FD. From the HS results in Table 3, the HS shows superior performance than any other method on FDRs with paying for higher FARs cost. The fourth and sixth columns of Table 3 show that FARs of SIM + OC-SVM and MPCA + OC-SVM methods provide the lowest false detection performance over all other methods with better FD results. In comparison, the proposed methods have superior characteristics in detecting accumulative error from successive residuals.
Study on residual sequence length
Since the parameter would make a difference to the results of FD, in this section the length of multi-sequence residuals is investigated. In order to consider both sensitivity and FDR performances, the appropriate length of the residual sequence should be selected for constructing features. The too-short sequence length would not be detected easily and the too-long sequence length would cause larger DT. Therefore, another simulation test is performed with different design parameters selected by SIM + OC-SVM and MPCA + OC-SVM. The soft fault of an accelerometer is determined to be injected to the system with a length of multi-sequence residuals ranging from 5 to 20. Table 4 summarizes the detailed FDRs, FARs, and time-delay indices of the simulation results.
FDRs, FARs (%), and DT (s) based on different design parameters.
FDR: fault detection rate; FAR: false alarm rate; SIM: similarity measure; OC-SVM: one-class support vector machine; MPCA: modified principal components analysis; DT: delay time; The bold indicates the optimal result obtained by the method in this index.
According to the results of FDRs and FARs given in Table 4, it is observed that the different design parameters, including the length of the multi-sequence residual, significantly affect the FD performance of SIM + OC-SVM and MPCA + OC-SVM methods. It should be indicated that the length of the multi-sequence residual will generate a different accumulative error in learning features. Correspondingly, the performance of the HS will be affected due to different lengths. In the column of FDRs, all lengths of multi-sequence residuals obtain similar FD performance after exceeding 15 points, which means that sufficient length of the sequence is essential to ensure FD results. The FARs column shows that too long or short length would cause more false alarms. Therefore, an appropriate length (
Conclusion
In the present study, a hybrid data-driven FD strategy is proposed. The proposed strategy is based on multi-sequence residual analysis and OC-SVM, which is applied to navigation sensors. First, the basic data-driven fault diagnosis methods and their recent developments are reviewed. Then, the HS framework is presented and the FD is formulated as an outlier-detection problem. The SIM and modified PCA are adopted to construct the learning features in which the fault errors over a period of time can be accumulated. Moreover, OC-SVM is applied for implementing outlier detection by training the learning features.
Furthermore, the proposed strategy is validated on the simulated integrated navigation system. The training dataset is obtained on free-faults conditions and four typical faults are added to the simulation system. The experimental results show that both SIM + OC-SVM and MPCA + OC-SVM methods can detect the abrupt and soft faults with high accuracy in real time. The HS can improve the FD rate by paying a small false alarm cost. Furthermore, the selection of multi-sequence residuals length in SIM + OC-SVM and MPCA + OC-SVM methods is discussed. Compared with previous studies, the data-driven FD strategy is more efficient and accurate. In the near future, it is intended to validate this method on real navigation sensors and integrate it with other FD approaches to improve reliability and stability.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the National Natural Science Foundation of China (61501493).
