Abstract
Task synthesis, function fusion, and resource integration are used to get higher performance and more reliability in integrated modular avionics system. However, safety issue is caused in the system integration process. In order to discover the potential hazards for function layer safety analysis, in this paper, we proposed an efficient biclustering algorithm—DeCluster, which can effectively mine all biclusters with maximal variant usage rate or maximal low usage rate in real-valued function-resource dataset. First, it constructs a sample weighted graph, which contains all resource relations set between two samples that satisfy the definitions; second, all maximal variant usage rate or low usage rate biclusters are mined in the graph. In order to get higher the mining efficiency of the algorithm, DeCluster algorithm uses several pruning strategies. The experimental results show our algorithm is more efficient.
Introduction
In recent years, integrated modular avionics (IMA) has largely enhanced the performance of aircrafts and become one of the major research interests in the air transport industry. However, IMA system integration process caused some safety issues: (1) resource integration generated failure spread, (2) function information fusion generated failure implication and chaos, and (3) task synthesis generated failure damages expansion. Therefore, it is necessary to analyze the safety 1 in IMA system.
In IMA system, function architecture is defined by well-formed organization, which is based on scheduling policy. According to the sensor data, the function operation and the system functions can be organized dynamically. The functional framework is defined by the corresponding task unit. Such organization is evaluated by its performance of completing the corresponding task. Different from the process introduced in ARP 4754, the functions are not addressed directly from the conceptual studies, but from requirements of tasks. The relation between functions and resources is denoted as a function-resource matrix, in which each row denotes a resource and each column denotes a function, the value is the use degree of one function to one resource. For example, for the resource whose computing capability is 100 MHz, function F1 needs 60 MHz computing capability to compute some results. The dependence degree of this function on this computing resource is 0.6. Through mining the above function-resource matrix, the usage relation between a group of functions and a group of resources can be gained. Given two functions F1F2, the resource relations called by each function are as follows: F1==>R1R2, F2==>R2R3R4. Assuming that F1F2 needs to cooperate to complete a task T. All above both functions may be called at the same time. For resource R2, it supports F1 and F2 simultaneously. There may have two conditions: (1) R2 has high effectiveness for F1, but has low effectiveness for F2; (2) R2 has high effectiveness for both F1 and F2. The safety degree of the first condition is higher than that of the second one. The reason is that resource R2 can serve F1 and F2 simultaneously in the first condition; while in the second condition, resource R2 needs to serve for two functions. From the perspective of functional safety, if resource R2 has defects, its influence on the first condition is lower than the second condition. So, through function-resource matrix mining, in order to achieve a group of functions, the resources which can satisfy all functional demands simultaneously and the resources which can satisfy all functional demands through multiple accesses can be mined, i.e. mining bicluster with variant usage rate or low usage rate in function-resource matrix. Through function-resource matrix mining, the resources satisfying functional demands simultaneously can be mined. Therefore, mining biclusters with variant usage rate or low usage rate in function-resource matrix can be used to analyze the safety between functions and resources. So the safety of function can be obtained by the safety of resources.
Biclustering 2 is a method for clustering condition set and gene set simultaneously. Therefore, biclustering algorithms are mainly used for biological analysis. Cheng and Church 2 used a low mean squared residue to generate biclusters. In Madeira and Oliveira, 3 biclustering algorithms for biological analysis are presented. However, above algorithm’s mining efficiency is not well. MicroCluster 4 uses weighted directed range multigraph to generate deterministic bicluster. SAMBA 5 is designed to mine constant value biclusters. OPSM 6 can find biclusters with coherent trends of up or down regulation. Biclusters with constant columns can be inferred by xMotifs. 7 Range Support is used by RAP 8 for mining constant row biclusters which uses range support measure to mine the meaningful patterns which are coherent for a substantial fraction of transactions or samples in the dataset. It adopts gene growth method to mine constant row biclusters. During mining process, RAP generates (n+1)-level biclusters from n-level biclusters by using range support criterion. However, it does not use efficient pruning techniques, so the mining efficiency is not well.
In order to use biclusters for disease detection, some biclustering methods are used to mine differential coexpression biclusters which are corrected coexpression in one microarray but not in the other. Okada and Inoue 9 and Serin and Vingron 10 used two-steps approach to infer differential coexpression biclusters, but the efficiency is low. Therefore, Wang et al. 11 and Southworth et al. 12 constructed a difference matrix to mine discriminative bicluster. DiBiCLUS 13 algorithm mines differential coexpression biclusters in two discretized gene expression datasets. Subspace differential coexpression (SDC) algorithm 14 is another method for mining SDC patterns. It infers the patterns which are coexpressed over a large percent of the conditions in one microarray dataset, but in a much smaller percent of conditions in the other. However, above biclustering algorithms cannot mine biclusters with variant usage rate or low usage rate simultaneously.
According to above analysis and in hopes of overcoming the limitations of biclustering methods to mine the bicluster with variant usage rate or low usage rate at the same time in the real-valued function-resource matrix. In this paper, we propose a new biclustering algorithm—DeCluster—to mine all biclusters with maximal variant usage rate or maximal low usage rate simultaneously in the real-valued function-resource matrix. First, it constructs a sample weighted graph, which contains all resource relations set between two samples that satisfy the definitions; second, all maximal variant usage rate or low usage rate biclusters are mined in the graph. In order to improve the mining efficiency of the algorithm, DeCluster algorithm uses several pruning strategies to mine maximal bicluster without candidate maintenance.
Problem definition
Biclustering algorithm was proposed for microarray dataset analysis first. The microarray dataset is defined as a gene expression matrix in real expression numbers, denoted as
A bicluster B is defined as a submatrix of D, where B = X×Y with
A real-valued function-resource matrix.
A real-valued variant usage rate function-resource matrix.
A real-valued nonvariant usage rate function-resource matrix.
A real-valued low usage rate function-resource matrix.
The safety analysis degree in the first and the third conditions is more important than the second condition. The reason is that, in the first condition and the third condition, the resource R2 can satisfy F1 and F2 simultaneously. In the third condition, resource R2 needs to satisfy the two functions, respectively. Our proposed DeCluster algorithm is to mine the first condition and the third condition.
It can be obtained from the description in Definition 1 that α and β are used to restrict resources with a low usage rate producing each function, e.g. bicluster F2F3F4(R1R3) in Table 1.
It can be obtained from the description in Definition 2 that at least one resource in bicluster of variant usage rate of resources satisfies the conditions in the formula in Definition 2 under two functions and meanwhile this resource satisfies the conditions in the formula in Definition 1 under other functions. For the convenience of description, such resources are defined as resources with variant usage rate as below:
Therefore, resources in bicluster with variant usage rate of resources must be those meeting Definitions 1 and 3. We will define the relationship among resources. The relationship among resources in true data and that among resources in discrete data have the same form of expression and only minor differences in definition.
Therefore, each resource in bicluster mined with DeCluster algorithm meets the first or second condition in Definition 4 under all functions. To improve the mining efficiency of the algorithm, DeCluster algorithm mines biclusters with maximal variant usage rate or maximal low usage rate in real-valued function-resource matrix by using sample growth method without candidate maintenance. The mining process of this algorithm will be introduced in the next section.
The DeCluster algorithm
Constructing sample relational weighted graph
Using undirected sample relational weighted graph for mining biclusters was used in Wang et al.11,15 The reason is that, if the number of samples is far less than the number of items, it can improve mining efficiency. Therefore, our DeCluster algorithm will adopt undirected sample relational weighted graph (hereinafter referred to as sample weighted graph) to mine biclusters with maximal variant usage rate or maximal low usage rate in the real-valued function-resource matrix.
According to the description in Definition 1, when the resources among functions satisfy the definition of variant usage rate, the weight between two functions does not meet commutativity. For instance, the weight under F1F2 is R1*R2R3, while the weight under F2F1 is R1*R2R3−R5. Therefore, in Definition 5, the weight of each edge is the weight under FiFj, where i<j. Figure 1 shows weight relationship graph corresponding to Table 1.
The sample weighted graph constructed from Table 1.
Mining maximal bicluster
According to descriptions in Definitions 1 and 2, when a new function is extended to the bicluster, it is necessary to calculate the intersection of all edges of the extending function and the resource collection of bicluster extended, thus ensuring that the resource collection under the extended function and that under existing functions satisfy constraint conditions in Definition 1 or 2. For the convenience of design of pruning strategies, it is required to not only calculate the intersection of resources, but also consider symbols before resources during sample growth and the calculation of intersection of weight, i.e. symbols before resources are also required for “intersection” calculation. Operational rules of these symbols can be obtained from the definition below.
It can be seen from Definition 3 that the calculation of intersection between “R1” and “*R1” will not occur. Therefore, its rules are not provided in Definition 6. During function extension, with the increase of functions, the calculation of intersection of multiple forms of expression of the same resource will occur. The intersection can be calculated according to operational rules described in Definition 6 according to the sequence. We will introduce how DeCluster algorithm uses pruning strategies to mine all biclusters with maximal variant usage rate or maximal low usage rate in sample relationship weight graph without candidate maintenance in detail. This paper will judge maximal bicluster with the method of prior detection proposed in [11] without candidate maintenance. That is to say, if resources under the current candidate sample and some prior candidate sample (mined sample) have some inclusion relation, i.e. all biclusters produced by the current candidate sample can be produced by some prior candidate sample and the current candidate sample can be pruned. During the pruning design of backward checking, if F1 is the prior candidate function of F2, the weight on two function edges is the resource collection information of F2F1 rather than F1F2. As resources under F1F2 and F2F1 have different forms of expression, the sample weighted graph made by this algorithm is a directed graph rather than undirected graph. For Fn and Fm, it is necessary to build edges on FnFm and FmFn, respectively. However, for FnFm and FmFn, the difference of weights on the edge is the interchange of resource expression forms “R1” and “*R1.” Therefore, for saving the storage space, the storage of weight is only that of weight on FiFi+1 edge. The weight on Fi+1Fi edge can be calculated with FiFi+1.
Resource R1 is, respectively, expressed as “R1” and “*R1” above when the form of expression of resources is illustrated, just for the convenience of design of pruning strategies. If a resource in the current candidate function to be extended satisfies the form of “R1,” this resource can be pruned according to the lemma below.
If a resource in the current candidate function to be extended meets the form of “*R1,” it should be judged whether this resource can be pruned according to the weight of prior candidate function. Therefore, the following lemma can be used for pruning.
Similarly, if a resource in the current candidate function to be extended meets the form of “−R1,” it should be judged whether this resource can be pruned according to the weight of prior candidate function. Therefore, the following lemma can be used for pruning.
It can be seen from Lemma 4 that, the candidate function can only be pruned if all resources in the candidate function can be obtained by resource extension in the same prior candidate function; otherwise, this candidate function will be extended.
The above output strategy is actually subject to the definition: if no successor or prior is its superset, it can be outputted. We will explain the algorithm mining process through an example. The data in the example are function-resource use relationship matrix shown in Table 1. It constructs the weighted graph among functions, as shown in Figure 1. The mining process for DeCluster mining Table 1 expressed matrix is shown in Figure 2. The specific description of DeCluster algorithm is as follows:
Example mining process of DeCluster algorithm.
Scanning dataset D and construct its weighted graph; For each pair of samples in D, if all samples linked the current extended sample satisfies pruning conditions in Lemma 4, then stop extending the current sample and change extend the next samples; If the current extending sample which linked the current extended samples do not satisfy pruning conditions in Lemma 4, then the current extending sample would be merged to the current extended samples; Go to (2) until all pairs of samples in D were extended.
Experimental result and analysis
The proportion of each value in three real-valued dataset.
In this section, the comparison will be made on the mining efficiency of DeCluster algorithm and RAP algorithm. To fully compare the scalability of algorithms, we generate multiple groups of datasets with different numbers of functions and resource in allusion to three datasets in Table 5. The selection of functions and resources is based on the order of functions and resources in dataset. The parameter of variant usage rate is 4 and that of low usage rate is 0.5. Figure 3(a) and (b) provides the comparison of performance period when the number of functions of two algorithms above is 10 and 20, respectively, and the number of resources is 200, 400, 600, 800, and 1000, respectively, and the parameter of relevancy under dataset D1 is 1. It can be seen from these figures that the mining time of both algorithms increases progressively with the increase of number of resources in dataset. Meanwhile, the mining efficiency of DeCluster algorithm is higher than that of RAP algorithm under each data size. Especially when the number of resources in dataset is high, the mining efficiency of DeCluster algorithm is nearly 20 times higher than that of RAP algorithm. The reason is that RAP algorithm mines bicluster with the method of resource extension. With the increase of number of resources in dataset, this algorithm needs more iterations to mine all biclusters meeting threshold conditions. However, DeCluster algorithm uses high-efficiency pruning strategies for mining and will produce more maximal biclusters especially when the number of resources in dataset is high and data are dense. Therefore, DeCluster algorithm has a higher pruning efficiency. Figure 4(a) and (b) provides the comparison of performance period under datasets with different resources of functions and resources when the parameter of relevancy of three algorithms above is 2 in dataset D1. Similar to the description in Figure 3(a) and (b), the mining efficiency of DeCluster algorithm is higher than that of RAP algorithm under each data size. When the number of resources in dataset is low, the pruning efficiency of DeCluster algorithm is not significantly higher than that of RAP algorithm. However, with the increase of number of resources in dataset, the pruning efficiency of DeCluster algorithm becomes significantly higher.
The running time comparison between two algorithms under different number of resources and functions in D1 when α = 1: (a) 10 functions and (b) 20 functions. The running time comparison between two algorithms under different number of resources and functions in D1 when α = 2: (a) 10 functions and (b) 20 functions.

Figures 5(a) and (b) and 6(a) and (b), respectively, provide the comparison of performance period of both algorithms above under datasets with different numbers of sampling sites and resources when their parameters of relevancy under dataset D2 are, respectively, 1 and 2. It can be seen that as proportions of 0.1 and 0.2 in dataset D2 increase compared to those in dataset D1, according to descriptions of the definition of variant usage rate and low usage rate, mining dataset D2 will produce more biclusters than mining dataset D1 under the same parameters. Therefore, when the number of functions is 20, RAP algorithm cannot mine datasets with the number of resources higher than 400 in limited memory space, but DeCluster algorithm can complete all mining processes within 10 s. Figures 7(a) and (b) and 8(a) and (b), respectively, provide the comparison of performance period of both algorithms above under datasets with different numbers of sampling sites and resources when their parameters of relevancy under dataset D3 are, respectively, 1 and 2.
The running time comparison between two algorithms under different number of resources and functions in D2 when α = 1: (a) 10 functions and (b) 20 functions. The running time comparison between two algorithms under different number of resources and functions in D2 when α = 2: (a) 10 functions and (b) 20 functions. The running time comparison between two algorithms under different number of resources and functions in D3 when α = 1: (a) 10 functions and (b) 20 functions. The running time comparison between two algorithms under different number of resources and functions in D3 when α = 2: (a) 10 functions and (b) 20 functions.



Conclusions
In order to use resource ability to computer the health degree of a set of functions, in this paper, we propose an efficient bicluster mining algorithm—DeCluster, to effectively mine all biclusters with maximal variant usage rate and maximal low usage rate in the real-valued function-resource matrix. The mining process of DeCluster algorithm is divided into two steps: first, scanning original function-resource matrix, according to the definition of biclusters with maximal variant usage rate and maximal low usage rate, all sample weighted graphs satisfying the above definition are produced; then, it uses sample growth method to mine all biclusters with maximal variant usage rate bicluster and maximal low usage rate bicluster. In order to improve the mining efficiency, DeCluster algorithm uses multiple pruning strategies to ensure the mining of maximal bicluster without candidate maintenance. However, due to the lack of true test data, all experimental results of the algorithm in this paper are mined based on artificially generated data. Our next research direction is to mine biclusters with variant usage rate and low usage rate in function-resource matrix measured in true environment.
Footnotes
Authors’ contributions
LZ and ZX conceived and designed the experiments; LZ performed the experiments; MW and LZ analyzed the data; MW and TY wrote the paper.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This paper is supported by Aviation Foundation under Grant No. 20155553036 and No. 20155515002 and it is also supported by National Key Basic Research Program of China under Grant No. 2014CB744900.
