Sage Journals: Discover world-class research

Abstract

Artificial intelligence is one of the hottest research topics in computer science. In general, when it comes to the needs to perform deep learning, the most intuitive and unique implementation method is to use neural network. But there are two shortcomings in neural network. First, it is not easy to be understood. When encountering the needs for implementation, it often requires a lot of relevant research efforts to implement the neural network. Second, the structure is complex. When constructing a perfect learning structure, in order to achieve the fully defined connection between nodes, the overall structure becomes complicated. It is hard for developers to track the parameter changes inside. Therefore, the goal of this article is to provide a more streamlined method so as to perform deep learning. A modified high-level fuzzy Petri net, called deep Petri net, is used to perform deep learning, in an attempt to propose a simple and easy structure and to track parameter changes, with faster speed than the deep neural network. The experimental results have shown that the deep Petri net performs better than the deep neural network.

Keywords

High-level fuzzy Petri net deep Petri net fuzzy reasoning unsupervised learning supervised learning

Introduction

Encountering the needs for implementation, it often requires lot of relevant efforts to construct a learning structure. In order to make a full connection among nodes, the learning structure becomes complicated. It is not easy for developers to track the parameters changed inside. Therefore, this reason motivates us to provide a streamlined method to perform deep learning. In Shen et al.’s study,¹ the high-level fuzzy Petri net (HLFPN) toward machine learning has been successfully proved. The HLFPN model provides a faster, less-complex, and easier implementation method. Thus, this article aims to use the multilayer HLFPN to perform deep learning, in an attempt to propose a simpler structure, easier to track the parameters changed, and a faster architecture than the neural network (NN).^2–6

The difference from the work by Shen et al.¹ is mainly the structural changes. The structure of the HLFPN model is different from the deep neural network (DNN) model because they both depend on the types of problems.⁷ However, the DNN is a fixed structure, only fine-tuning the algorithm for each layer.⁸ The HLFPN model is intended to solve this specific problem. First, write the predicate logic based on the problem.⁹ Then, draw the structure according to the predicate logic and use the structure to perform machine learning. The advantages of this design for the problem can make the HLFPN model simpler and have no redundant nodes so as to decrease the complexity of the structure.

The proposed architecture is called deep Petri network (DPN). Similar to HLFPN, first, write the predicate logic based on the problem; then, draw the structure according to predicate logic and use this drawn structure to perform targeted deep learning because the design process makes the problem formulation have the advantages of the structure. DPN makes the structure simpler; and does not require redundant nodes to increase the complexity of the structure.^10–13 Hence, the simpler structure can effectively improve the computational speed.

This study proposes a concept of “ring.” The DPN model was developed to simulate an NN model.^14,15 It is formed layer by layer and is placed from left to right. Its structure is expressed in layers like a planet orbit. The transitions and places required by the predicate logic are placed layer by layer on the ring. Each node is linked to the core at the end, and the core determines whether or not the calculation result is done as expected. The concept of a core is to perform the unsupervised or supervised deep learning. Finally, the results of the core decision will be passed to the supervisor node at the outermost ring so as to correct the initial values of the entire system.^16–19

The supervisor node located at the outermost ring makes the entire system become a complete loop, which makes the learning logic easier to be observed. Since the DNN does not show the graphical modification of the weights, which makes it difficult for the beginners to understand, the supervisor node can monitor the changes of each system to determine whether or not the modification of the attribution degree changes differently in the convergence direction.^20–22 Thus, the stability and convergence speed of the system are further improved.

In section “Literature review,” the recent research of the NN is presented, listing a few common types of NN to show the difference between each other. Then, the core algorithm of the HLFPN model is introduced. The main ideas of unsupervised and supervised DPN models are proposed in sections “Unsupervised deep learning algorithm” and “Supervised deep learning algorithm,” respectively. The effects of DPN are all discussed in section “Main results.” Finally, section “Conclusion” presents the conclusion and future work.

Literature review

The NN models have been completely studied. Especially, almost all of the developing technologies want more automated and more intelligent functions in recent years. In other words, they need to be combined with artificial intelligence (AI) techniques, and the foundation of AI techniques is deep learning. It is now aimed at all types of the problems so that there are a variety of NNs to deal with a wide range of problems.^23,24

Deep learning

Deep learning is a kind of machine learning. In general, it is based on NN, but the number of layers is more than the original NN.²⁵ It makes data sets through linear or nonlinear transform in many layers to extract the features of data sets. Deep learning contains many different types for different problems, which are DNN, recurrent neural network (RNN), convolutional neural network (CNN), and deep autoencoder (DA).^26–28

NN categories

In this sub-section, some popular NN models are listed to make a difference among them and to discuss their advantages.

DNN

DNN is the most basic NN. It is a kind of structure inspired by the human brain. The architecture is a layer-by-layer structure. In addition to the input and output layers, the middle one is called the hidden layer. Although DNN is the architecture that has been widely used, it still has inevitable problems that are slower and heavier.

RNN

The characteristics of RNN are to use the past results to calculate the current output.³ This memory-like concept is very suitable for the time-dependent fields, such as text mining, time series, and some other fields. Nowadays, RNN’s research direction is that the memory time needs to become longer or shorter. For example, the data sets from different times in the past were used to simultaneously calculate the current output.

CNN

CNN is a composite NN. In general, the architecture is a convolutional layer followed by a traditional fully connected layer. The main part better than DNN is that it is more efficient for multi-quadrant data processing. The function of the convolutional layer is the input-data filtering. Two different structures can process the data differently. First, it reduces the data complexity in the convolutional layer. Then, the data transmitted to the fully connected layer is relatively simple, in order to achieve the goal of speeding up.

DA

The concept of DA is a stack of multiple DNNs. The main data processing is intended to project high dimension to lower dimension and to process it. DA is characterized by the use of more cells in the input and output, while hiding the cells that have been less used to achieve the goal of acceleration. Although the projection of units will make some data lost, the speed-up effect achieved by compressing data still makes DA become a popular NN architecture.

HLFPN

The HLFPN¹ is defined as an eight-tuple HLFPN = (P, T, F, C, V, α, β, δ), where

P = {p₁, p₂, p₃,…, p_k} finite set of places.

T = {t₁, t₂, t₃,…, t_l} finite set of transitions. P ∪ T ≠∅.

F ⊆ (P × T)∪(T × P) called the flow relation and is also a finite set of arcs, each one representing the fuzzy set (i.e. fuzzy term) for an antecedent or a consequent; where the positive arcs (i.e. THEN parts) are denoted by $◯ \to$ .

C = {X, Y, Z} finite set of linguistic variables, for example, X, Y, and Z, where X = {x₁, x₂x₃,…, x_n}, Y = {y₁, y₂y₃,…, y_m}, and Z = {z₁, z₂z₃,…, z_q}.

V = {v₁, v₂v₃,…, v₄} finite set of fuzzy truth values known as the fuzzy relational matrix between the antecedent and the consequent of a rule.

α: P→C association function, mapping from places to linguistic variables. α(p_i) = c_i, i = 1,…, I, where C = {c_i} is a set of linguistic variables in the knowledge base (KB), and I is the number of linguistic variables in the KB.

β: F→[0, 1] association function, mapping from the flow relations to the fuzzy truth values between 0 and 1.

δ: T→V association function, mapping from transitions to fuzzy relational matrices.

Unsupervised learning

The unsupervised learning scheme has no feedback coming from the environment to show what the desired outputs of a network should be or whether they are correct. The network must train itself to capture any relationship of interest from the input data and transform the captured relationships into outputs.^1,11

Supervised learning

Supervised learning is the task of adjusting a function that maps an input to an output based on example input–output pairs. It infers a function from the labeled training data including a set of training examples. In the supervised learning, each example is a pair of input object and the desired output value. A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.²

Unsupervised deep learning algorithm

The DPN is a modified HLFPN model with a ring structure. The transition and the place are placed on the ring. Each ring is like a layer of traditional deep learning.

INPUT: Input place (IP) Mem( $p_{i}$ ) (or fuzzy set), $\forall p_{i} \in IP$ , where IP denotes a set of input places.

OUTPUT: Output place (OP) Mem $(p_{i})$ (or fuzzy set), $\forall p_{i} \in OP$ , where OP denotes a set of output places.

Procedure:

Step 1: Initialize, check whether Mem $(P_{i})$ (or fuzzy set) is empty or not.

Step 2: Calculate fuzzy relational matrices V $(t_{i})$ of the current transition, where

V $(t_{i})$ denotes a fuzzy relational matrix.

Step 3: Use the input data items to calculate the associated possibility difference denoted as $δ$ , δ = W_a-input–W_a , where W_a-input denotes a vector of input data items (or fuzzy sets) at the antecedent.

Step 4: If δ > 0, then W_a′ = W_a + α| δ |:

If δ ≤ 0, then W_a′ = W_a , where α is a learning constant,

|δ| denotes the absolute values of vector δ ,

W_a denotes a present weight (or fuzzy set), and

W_a ′ denotes a new weight (or fuzzy set).

Step 5: ∀ transition t^j, fire the enabled transitions, perform the cylindrical extension and execute Zadeh’s max–min operation, that is, compute W_c′ = W_a′o V(t^j), where W_c′ denotes a vector of new data items (or fuzzy sets) at the consequent.

Step 6: Return to Step 3 until no transition is available to be fired.

Step 7: Send the results to the core, transform them into the output place, and transfer the data sets to the supervisor place on the outer ring.

Step 8: Repeat the above steps until all input training data sets end.

Example 1:

The fuzzy production rules are shown as follows:

$R_{1} : if X_{1} is A and X_{2} is B then Y_{1} is C$

$R_{2} : if X_{2} is B and X_{3} is D then O is E$

$R_{3} : if Y_{1} is C then O is F$

The predicate logic form is shown below.

$X_{1} (A) \land X_{2} (B) \to Y_{1} (C)$

$X_{2} (B) \land X_{3} (D) \to O (E)$

$Y_{1} (C) \to O (F)$

According to the predicate logic form shown above, we get Figure 1.

Figure 1.

Unsupervised DPN for Example 1.

Numerical analysis

Step 1: Initialize

The initial marking is

\begin{matrix} \begin{matrix} p_{1} p_{2} p_{3} p_{4} p_{5} p_{6} \end{matrix} \\ M_{0} = [\begin{matrix} 1 1 1 0 0 0 \end{matrix}] \end{matrix}

The data sets which are non-zero mean the token is available to fire its transition. Assume that the fuzzy sets are shown as follows:

\begin{matrix} A = \frac{0.2}{a_{1}} + \frac{0.6}{a_{2}} + \frac{0.3}{a_{3}} \\ B = \frac{0.2}{b_{1}} + \frac{0.8}{b_{2}} + \frac{0.1}{b_{3}} \\ C = \frac{0.3}{c_{1}} + \frac{0.9}{c_{2}} + \frac{0.4}{c_{3}} \\ D = \frac{0.1}{d_{1}} + \frac{0.9}{d_{2}} + \frac{0.3}{d_{3}} \\ E = \frac{0.2}{e_{1}} + \frac{0.9}{e_{2}} + \frac{0.1}{e_{3}} \\ F = \frac{0.1}{f_{1}} + \frac{0.7}{f_{2}} + \frac{0.3}{f_{3}} \end{matrix}

Step 2: Calculate the fuzzy relational matrices.

\begin{matrix} A \times B = {(\begin{matrix} 0.2 & 0.6 & 0.3 \end{matrix})}^{T} \land (\begin{matrix} 0.2 & 0.8 & 0.1 \end{matrix}) \\ = | \begin{matrix} 0.2 & 0.2 & 0.1 \\ 0.2 & 0.6 & 0.1 \\ 0.2 & 0.3 & 0.1 \end{matrix} | \\ V_{11} (t_{1}) = | \begin{matrix} 0.2 & 0.2 & 0.1 \\ 0.2 & 0.3 & 0.1 \\ 0.2 & 0.3 & 0.1 \end{matrix} | \in A \times B \times c_{1} \\ V_{12} (t_{1}) = | \begin{matrix} 0.2 & 0.2 & 0.1 \\ 0.2 & 0.6 & 0.1 \\ 0.2 & 0.3 & 0.1 \end{matrix} | \in A \times B \times c_{2} \\ V_{13} (t_{1}) = | \begin{matrix} 0.2 & 0.2 & 0.1 \\ 0.2 & 0.4 & 0.1 \\ 0.2 & 0.3 & 0.1 \end{matrix} | \in A \times B \times c_{3} \\ B \times D = {(\begin{matrix} 0.2 & 0.8 & 0.1 \end{matrix})}^{T} \land (\begin{matrix} 0.1 & 0.9 & 0.3 \end{matrix}) \\ = | \begin{matrix} 0.1 & 0.2 & 0.2 \\ 0.1 & 0.8 & 0.3 \\ 0.1 & 0.1 & 0.1 \end{matrix} | \\ V_{21} (t_{2}) = | \begin{matrix} 0.1 & 0.2 & 0.1 \\ 0.1 & 0.2 & 0.2 \\ 0.1 & 0.1 & 0.1 \end{matrix} | \in B \times D \times e_{1} \end{matrix}

\begin{matrix} V_{22} (t_{2}) = | \begin{matrix} 0.1 & 0.2 & 0.2 \\ 0.1 & 0.8 & 0.3 \\ 0.1 & 0.1 & 0.1 \end{matrix} | \in B \times D \times e_{2} \\ V_{23} (t_{2}) = | \begin{matrix} 0.1 & 0.1 & 0.1 \\ 0.1 & 0.1 & 0.1 \\ 0.1 & 0.1 & 0.1 \end{matrix} | \in B \times D \times e_{3} \\ V_{3} (t_{3}) = {(\begin{matrix} 0.3 & 0.9 & 0.4 \end{matrix})}^{T} \land (\begin{matrix} 0.1 & 0.7 & 0.3 \end{matrix}) \\ = | \begin{matrix} 0.1 & 0.3 & 0.3 \\ 0.1 & 0.7 & 0.3 \\ 0.1 & 0.4 & 0.3 \end{matrix} | \end{matrix}

Steps 3–6: Assume that the first input data items are shown as follows:

\begin{matrix} X^{(1)} = {{A'}_{1}, {B'}_{1}, {D'}_{1}} \\ {A'}_{1} = \frac{0.3}{{a'}_{11}} + \frac{0.7}{{a'}_{12}} + \frac{0.2}{{a'}_{13}} \\ {B'}_{1} = \frac{0.2}{{a'}_{11}} + \frac{0.8}{{a'}_{12}} + \frac{0.1}{{a'}_{13}} \\ {D'}_{1} = \frac{0.1}{{a'}_{11}} + \frac{0.8}{{a'}_{12}} + \frac{0.3}{{a'}_{13}} \end{matrix}

Assume that the transition firing sequence is $M_{0} t_{1} M_{1} t_{2} M_{2} t_{3} M_{3} t_{4} M_{4}$ .

\begin{matrix} p_{1} & p_{2} & p_{3} & p_{4} & p_{5} \end{matrix}

M_{1} = [\begin{matrix} 0 & 1 & 1 & 1 & 0 \end{matrix}]

M_{2} = [\begin{matrix} 0 & 0 & 0 & 1 & 1 \end{matrix}]

M_{3} = [\begin{matrix} 0 & 0 & 0 & 0 & 2 \end{matrix}]

Calculate $δ_{i}$

δ_{1} = A'_{1} - A = [\begin{matrix} 0.1 & 0.1 & - 0.1 \end{matrix}]

δ_{2} = B'_{1} - B = [\begin{matrix} 0.0 & 0.0 & 0.0 \end{matrix}]

δ_{3} = D'_{1} - D = [\begin{matrix} 0.0 & - 0.1 & 0.0 \end{matrix}]

Assume $α = 1$ , then we have

A' = \frac{0.3}{{a'}_{11}} + \frac{0.7}{{a'}_{12}} + \frac{0.3}{{a'}_{13}}

B' = \frac{0.2}{{b'}_{11}} + \frac{0.8}{{b'}_{12}} + \frac{0.1}{{b'}_{13}}

D' = \frac{0.1}{{d'}_{11}} + \frac{0.8}{{d'}_{12}} + \frac{0.3}{{d'}_{13}}

\begin{matrix} A' \times B' = {(\begin{matrix} 0.3 & 0.7 & 0.3 \end{matrix})}^{T} \land (\begin{matrix} 0.2 & 0.8 & 0.1 \end{matrix}) \\ = | \begin{matrix} 0.2 & 0.3 & 0.1 \\ 0.2 & 0.7 & 0.1 \\ 0.2 & 0.3 & 0.1 \end{matrix} | \end{matrix}

V'_{11} (t_{1}) = | \begin{matrix} 0.2 & 0.2 & 0.1 \\ 0.2 & 0.2 & 0.1 \\ 0.2 & 0.2 & 0.1 \end{matrix} | \in A' \times B' \times c_{1}

V'_{12} (t_{1}) = | \begin{matrix} 0.2 & 0.3 & 0.1 \\ 0.2 & 0.7 & 0.1 \\ 0.2 & 0.7 & 0.1 \end{matrix} | \in A' \times B' \times c_{2}

V_{13}' (t_{1}) = | \begin{matrix} \begin{matrix} 0.2 \\ 0.2 \end{matrix} \\ 0.2 \end{matrix} \begin{matrix} \begin{matrix} 0.3 \\ 0.4 \end{matrix} \\ 0.4 \end{matrix} \begin{matrix} \begin{matrix} 0.1 \\ 0.1 \end{matrix} \\ 0.1 \end{matrix} | \in A' \times B' \times c_{3}

(0.3 0.7 0.3) o V_{11}' (t_{1}) = (0.2 0.2 0.1)

(0.3 0.7 0.3) o V_{12}' (t_{1}) = (0.2 0.7 0.1)

(0.3 0.7 0.3) o V_{13}' (t_{1}) = (0.2 0.4 0.1)

\begin{matrix} C_{1}' = B_{1}' o (A_{1}' o V_{1} (t_{1})) \\ = (0.2 0.8 0.1) o | \begin{matrix} \begin{matrix} 0.2 \\ 0.3 \end{matrix} \\ 0.1 \end{matrix} \begin{matrix} \begin{matrix} 0.2 \\ 0.6 \end{matrix} \\ 0.1 \end{matrix} \begin{matrix} \begin{matrix} 0.2 \\ 0.4 \end{matrix} \\ 0.1 \end{matrix} | = (0.3 0.6 0.4) \end{matrix}

\begin{matrix} B' \times D' = {(0.2 0.8 0.1)}^{T} \land (0.2 0.9 0.1) \\ = | \begin{matrix} \begin{matrix} 0.2 \\ 0.2 \end{matrix} \\ 0.1 \end{matrix} \begin{matrix} \begin{matrix} 0.2 \\ 0.8 \end{matrix} \\ 0.1 \end{matrix} \begin{matrix} \begin{matrix} 0.1 \\ 0.1 \end{matrix} \\ 0.1 \end{matrix} | \end{matrix}

V_{21}' (t_{2}) = | \begin{matrix} \begin{matrix} 0.2 \\ 0.2 \end{matrix} \\ 0.1 \end{matrix} \begin{matrix} \begin{matrix} 0.2 \\ 0.2 \end{matrix} \\ 0.1 \end{matrix} \begin{matrix} \begin{matrix} 0.1 \\ 0.1 \end{matrix} \\ 0.1 \end{matrix} | \in B' \times D' \times e_{1}

V_{22}' (t_{2}) = | \begin{matrix} \begin{matrix} 0.2 \\ 0.2 \end{matrix} \\ 0.1 \end{matrix} \begin{matrix} \begin{matrix} 0.2 \\ 0.8 \end{matrix} \\ 0.1 \end{matrix} \begin{matrix} \begin{matrix} 0.1 \\ 0.1 \end{matrix} \\ 0.1 \end{matrix} | \in B' \times D' \times e_{2}

V'_{23} (t_{2}) = | \begin{matrix} 0.1 & 0.1 & 0.1 \\ 0.1 & 0.1 & 0.1 \\ 0.1 & 0.1 & 0.1 \end{matrix} | \in B' \times D' \times e_{3}

(\begin{matrix} 0.2 & 0.8 & 0.1 \end{matrix}) o V'_{21} (t_{2}) = (\begin{matrix} 0.2 & 0.2 & 0.1 \end{matrix})

(\begin{matrix} 0.2 & 0.8 & 0.1 \end{matrix}) o V'_{22} (t_{2}) = (\begin{matrix} 0.2 & 0.8 & 0.1 \end{matrix})

(\begin{matrix} 0.2 & 0.8 & 0.1 \end{matrix}) o V'_{23} (t_{2}) = (\begin{matrix} 0.1 & 0.1 & 0.1 \end{matrix})

\begin{matrix} E'_{1} = D'_{1} o ({B'}_{1} o V_{2} (t_{1})) \\ = (\begin{matrix} 0.2 & 0.9 & 0.1 \end{matrix}) o | \begin{matrix} 0.2 & 0.2 & 0.1 \\ 0.2 & 0.8 & 0.1 \\ 0.1 & 0.1 & 0.1 \end{matrix} | \\ = (\begin{matrix} 0.2 & 0.8 & 0.1 \end{matrix}) \end{matrix}

F_{1} = \frac{0.1}{f_{1}} + \frac{0.6}{f_{2}} + \frac{0.3}{f_{3}}

Step 7: Finally, the actual output is shown below

O_{a} = E_{1} \cup F_{1} = \frac{0.2}{O_{a 1}} + \frac{0.8}{O_{a 2}} + \frac{0.3}{O_{a 3}}

Step 8: Repeat the next iteration.

Supervised deep learning algorithm

The difference between unsupervised learning and supervised learning is that the core will perform supervised learning to confirm the convergence of the ring. If the situation is not done as expected, the core will feedback to the supervisor place on the ring to correct the attribution.

INPUT: Input place (IP) Mem $(p_{i})$ (or fuzzy set), $\forall p_{i} \in IP$ , where IP denotes a set of input places.

OUTPUT: Output place (OP) Mem $(p_{i})$ (or fuzzy set), $\forall p_{i} \in OP$ , where OP denotes a set of output places.

PROCEDURE:

Step 1: Initialize, check whether Mem $(P_{i})$ (or fuzzy set) is empty or not.

Step 2: Calculate the fuzzy relational matrices V $(t_{i})$ of the current transition and use the input to perform cylindrical extension.

Step 3: Fire the enabled transitions, and execute Zadeh’s max–min operation.

Step 4: Return to Step 3 until no enabled transition is available.

Step 5: Send the results to the core and transform them into the output place. Transfer the data sets to the supervisor place on the outer ring.

Step 6: The desired output $O_{d}$ on the supervisor place is subtracted from the actual output $O_{a}$ . If the difference is larger, make the initial values become smaller. On the contrary, if the difference is smaller, make the initial values on the outer ring become larger.

Step 7: Recalculate the new fuzzy relational matrices $V' (t_{i})$ with the new input data sets, repeat Steps 2–7.

Step 8: Repeat the above steps until all input training data sets end.

Example 2:

The fuzzy production rules are shown as follows:

$R_{1} : if X_{1} is A and X_{2} is B then Y_{1} is C$

$R_{2} : if X_{2} is B and X_{3} is D then Y_{2} is E$

$R_{3} : if Y_{1} is C then O_{a} is F_{1}$

$R_{4} : if Y_{2} is E then O_{a} is F_{2}$

Thus, the predicate logic form is shown below.

$X_{1} (A) \land X_{2} (B) \to Y_{1} (C)$

$X_{2} (B) \land X_{3} (D) \to Y_{2} (E)$

$Y_{1} (C) \to O_{a} (F_{1})$

$Y_{2} (E) \to O_{a} (F_{2})$

According to the predicate logic form shown above, we get Figure 2.

Figure 2.

Supervised DPN for Example 2.

Numerical analysis

Step 1: Initialize

The initial marking is

\begin{matrix} p_{1} & p_{2} & p_{3} & p_{4} & p_{5} & p_{6} \end{matrix}

M_{0} = [\begin{matrix} 1 & 1 & 1 & 0 & 0 & 0 \end{matrix}]

The data sets which are nonzero mean the tokens are available to fire its transition.

Assume that the fuzzy sets are shown as follows:

\begin{matrix} A = \frac{0.4}{a_{1}} + \frac{0.8}{a_{2}} + \frac{0.3}{a_{3}} \\ B = \frac{0.2}{b_{1}} + \frac{0.8}{b_{2}} + \frac{0.1}{b_{3}} \\ C = \frac{0.2}{c_{1}} + \frac{0.9}{c_{2}} + \frac{0.3}{c_{3}} \\ D = \frac{0.1}{d_{1}} + \frac{0.9}{d_{2}} + \frac{0.3}{d_{3}} \\ E = \frac{0.1}{e_{1}} + \frac{0.6}{e_{2}} + \frac{0.2}{e_{3}} \\ F_{1} = \frac{0.1}{f_{1}} + \frac{0.7}{f_{2}} + \frac{0.3}{f_{3}} \\ F_{2} = \frac{0.2}{g_{1}} + \frac{0.6}{g_{2}} + \frac{0.3}{g_{3}} \end{matrix}

Step 2: Calculate the fuzzy relational matrices

\begin{matrix} A \times B = {(\begin{matrix} 0.4 & 0.8 & 0.3 \end{matrix})}^{T} \land (\begin{matrix} 0.2 & 0.8 & 0.1 \end{matrix}) \\ = | \begin{matrix} 0.2 & 0.4 & 0.1 \\ 0.2 & 0.8 & 0.1 \\ 0.2 & 0.3 & 0.1 \end{matrix} | \\ V_{11} (t_{1}) = | \begin{matrix} 0.2 & 0.2 & 0.1 \\ 0.2 & 0.2 & 0.1 \\ 0.2 & 0.2 & 0.1 \end{matrix} | \in A \times B \times c_{1} \\ V_{12} (t_{1}) = | \begin{matrix} 0.2 & 0.4 & 0.1 \\ 0.2 & 0.8 & 0.1 \\ 0.2 & 0.3 & 0.1 \end{matrix} | \in A \times B \times c_{2} \end{matrix}

\begin{matrix} V_{13} (t_{1}) = | \begin{matrix} 0.2 & 0.3 & 0.1 \\ 0.2 & 0.3 & 0.1 \\ 0.2 & 0.2 & 0.1 \end{matrix} | \in A \times B \times c_{3} \\ B \times D = {(\begin{matrix} 0.2 & 0.8 & 0.1 \end{matrix})}^{T} \land (\begin{matrix} 0.1 & 0.9 & 0.3 \end{matrix}) \\ = | \begin{matrix} 0.1 & 0.2 & 0.2 \\ 0.1 & 0.8 & 0.3 \\ 0.1 & 0.1 & 0.1 \end{matrix} | \\ V_{21} (t_{2}) = | \begin{matrix} 0.1 & 0.1 & 0.1 \\ 0.1 & 0.1 & 0.1 \\ 0.1 & 0.1 & 0.1 \end{matrix} | \in B \times D \times e_{1} \\ V_{22} (t_{2}) = | \begin{matrix} 0.1 & 0.2 & 0.2 \\ 0.1 & 0.6 & 0.3 \\ 0.1 & 0.1 & 0.1 \end{matrix} | \in B \times D \times e_{2} \\ V_{23} (t_{2}) = | \begin{matrix} 0.1 & 0.2 & 0.2 \\ 0.1 & 0.2 & 0.2 \\ 0.1 & 0.1 & 0.1 \end{matrix} | \in B \times D \times e_{3} \\ V_{3} (t_{3}) = {(\begin{matrix} 0.2 & 0.9 & 0.3 \end{matrix})}^{T} \land (\begin{matrix} 0.1 & 0.7 & 0.3 \end{matrix}) \\ = | \begin{matrix} 0.1 & 0.2 & 0.2 \\ 0.1 & 0.7 & 0.3 \\ 0.1 & 0.3 & 0.3 \end{matrix} | \\ V_{4} (t_{4}) = {(\begin{matrix} 0.1 & 0.6 & 0.2 \end{matrix})}^{T} \land (\begin{matrix} 0.2 & 0.6 & 0.3 \end{matrix}) \\ = | \begin{matrix} 0.1 & 0.1 & 0.1 \\ 0.2 & 0.6 & 0.6 \\ 0.2 & 0.2 & 0.2 \end{matrix} | \end{matrix}

Steps 3–5: Assume that the first input data items are shown as follows

X^{(1)} = {{A'}_{1}, {B'}_{1}, {D'}_{1}}

A'_{1} = \frac{0.3}{{a'}_{11}} + \frac{0.4}{{a'}_{12}} + \frac{0.1}{{a'}_{13}}

B'_{1} = \frac{0.1}{{a'}_{11}} + \frac{0.7}{{a'}_{12}} + \frac{0.4}{{a'}_{13}}

D'_{1} = \frac{0.4}{{a'}_{11}} + \frac{0.9}{{a'}_{12}} + \frac{0.3}{{a'}_{13}}

Assume that the transition firing sequence is $M_{0} t_{1} M_{1} t_{2} M_{2} t_{3} M_{3} t_{4} M_{4}$ .

\begin{matrix} \begin{matrix} p_{1} & p_{2} & p_{3} & p_{4} & p_{5} & p_{6} \end{matrix} \\ M_{1} = [\begin{matrix} 0 & 1 & 1 & 1 & 0 & 0 \end{matrix}] \\ M_{2} = [\begin{matrix} 0 & 0 & 0 & 1 & 1 & 0 \end{matrix}] \\ M_{3} = [\begin{matrix} 0 & 0 & 0 & 0 & 1 & 1 \end{matrix}] \\ M_{4} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 & 2 \end{matrix}] \\ (\begin{matrix} 0.3 & 0.4 & 0.1 \end{matrix}) o V_{11} (t_{1}) = (\begin{matrix} 0.2 & 0.4 & 0.1 \end{matrix}) \\ (\begin{matrix} 0.3 & 0.4 & 0.1 \end{matrix}) o V_{12} (t_{1}) = (\begin{matrix} 0.2 & 0.4 & 0.1 \end{matrix}) \\ (\begin{matrix} 0.3 & 0.4 & 0.1 \end{matrix}) o V_{13} (t_{1}) = (\begin{matrix} 0.2 & 0.3 & 0.1 \end{matrix}) \\ {C'}_{1} = {B'}_{1} o ({A'}_{1} o V_{1} (t_{1})) \\ = (\begin{matrix} 0.1 & 0.7 & 0.4 \end{matrix}) o | \begin{matrix} 0.2 & 0.2 & 0.2 \\ 0.4 & 0.4 & 0.3 \\ 0.1 & 0.1 & 0.1 \end{matrix} | \\ = (\begin{matrix} 0.4 & 0.4 & 0.3 \end{matrix}) \\ (\begin{matrix} 0.1 & 0.7 & 0.4 \end{matrix}) o V_{21} (t_{2}) = (\begin{matrix} 0.1 & 0.1 & 0.1 \end{matrix}) \\ (\begin{matrix} 0.1 & 0.7 & 0.4 \end{matrix}) o V_{22} (t_{2}) = (\begin{matrix} 0.1 & 0.6 & 0.3 \end{matrix}) \\ (\begin{matrix} 0.1 & 0.7 & 0.4 \end{matrix}) o V_{23} (t_{3}) = (\begin{matrix} 0.1 & 0.2 & 0.2 \end{matrix}) \\ {E'}_{1} = (\begin{matrix} 0.4 & 0.9 & 0.3 \end{matrix}) o | \begin{matrix} 0.1 & 0.1 & 0.1 \\ 0.1 & 0.6 & 0.2 \\ 0.1 & 0.3 & 0.2 \end{matrix} | \\ = (\begin{matrix} 0.1 & 0.6 & 0.2 \end{matrix}) \\ {F'}_{1} = {C'}_{1} o V_{3} (t_{3}) = \frac{0.1}{f_{1}} + \frac{0.4}{f_{2}} + \frac{0.3}{f_{3}} \\ {F'}_{2} = {E'}_{1} o V_{4} (t_{4}) = \frac{0.2}{f_{1}} + \frac{0.6}{f_{2}} + \frac{0.2}{f_{3}} \end{matrix}

The actual output $O_{a}$ is

O_{a} = F'_{1} \cup F'_{2} = \frac{0.2}{O_{a 1}} + \frac{0.6}{O_{a 2}} + \frac{0.3}{O_{a 3}}

Step 6: The desired output $O_{d}$ on the supervisor place is subtracted from the actual output $O_{a}$ , $δ = O_{a} - O_{d}$ .

Assume that the desired output is $O_{d}$

\begin{matrix} O_{d} = \frac{0.1}{O_{d 1}} + \frac{0.8}{O_{d 2}} + \frac{0.3}{O_{d 3}} \\ δ = O_{a} - O_{d} = [\begin{matrix} 0.1 & - 0.2 & 0.0 \end{matrix}] \end{matrix}

Step 7: Readjust all fuzzy sets and recalculate the fuzzy relational matrices as follows

\begin{matrix} {A'}_{1} = \frac{0.2}{{a'}_{11}} + \frac{0.6}{{a'}_{12}} + \frac{0.1}{{a'}_{13}} \\ {B'}_{1} = \frac{0.1}{{a'}_{11}} + \frac{0.9}{{a'}_{12}} + \frac{0.4}{{a'}_{13}} \\ {C'}_{1} = \frac{0.3}{{a'}_{11}} + \frac{0.6}{{a'}_{12}} + \frac{0.3}{{a'}_{13}} \\ {D'}_{1} = \frac{0.3}{{a'}_{11}} + \frac{1.0}{{a'}_{12}} + \frac{0.3}{{a'}_{13}} \\ {E'}_{1} = \frac{0.0}{{a'}_{11}} + \frac{0.8}{{a'}_{12}} + \frac{0.2}{{a'}_{13}} \\ {F'}_{1} = \frac{0.0}{f_{1}} + \frac{0.6}{f_{2}} + \frac{0.3}{f_{3}} \\ {F'}_{2} = \frac{0.1}{f_{1}} + \frac{0.8}{f_{2}} + \frac{0.2}{f_{3}} \end{matrix}

Step 8: Repeat the next iteration.

Main results

This section aims to analyze DPN. First, compare the structure of DNN with that of DPN. Then, by observing both structures, we illustrate examples to compare their characteristics.

Research tools

This is intended to compare DPN with DNN. The comparison action is taken by DPN, in which the environment uses the 2.2 GHz Intel Core i7 and the memory is 16 GB 1600 MHz DDR3. The DPN was implemented in C++.

The DNN is based on SONY’s Neural Network Cloud. It is an open source for system developers. Users can easily build the DNN models only by choosing what function they need without writing any program.¹⁰

Benchmarks

The problem with fuzzy car is to figure out the car’s acceleration when it slides down from the top of U-shape track. The car has different numbers and different directions of acceleration in every moment.¹⁶

The problem with Pole and Rotation is that when pulling the object, the stick on the object does not fall down. Under normal conditions, the stick will fall in the direction of acceleration, and the system needs to learn how to control the correct acceleration. It can move the main object so that the stick on the object does not fall down.¹⁸

The t-Test is a benchmark used to compare the mean value of two data items and to check whether or not they are different.³ The test can deal with the unknown variances and independent or small data items. In this sub-section, we only use the logic rules of t-Test to conduct our experiments.¹⁹

Output-feedback fuzzy controller (OFFC) is a part of rotational/translational proof-mass actuator (RTAC).⁴ OFFC acts as a nonlinear controller in RTAC. It can process the approximations without using exact mathematical quantities. In our experiments, we only use its processing logic rules to run the test of our structures.

The fuzzy production rules for fuzzy car logic are shown as follows:

IF the car $(X_{1})$ is moving right $(A)$ AND the speed $(X_{2})$ is higher right $(B)$ THEN the acceleration $(Y_{1})$ is positive $(C)$ .

IF the car $(X_{1})$ moving right $(A)$ AND the speed $(X_{2})$ is lower right $(D)$ THEN the acceleration $(Y_{1})$ is negative $(E)$ .

According to the fuzzy production rules shown above, we get Figure 3.

Figure 3.

DPN of fuzzy car rules.

The fuzzy production rules for Pole and Rotation logic are shown as follows:

IF the pole direction $(X_{1})$ is right $(A)$ AND the rotation direction $(X_{2})$ is higher right $(B)$ THEN the car $(X_{3})$ is moving right $(C)$ .

IF the car $(X_{3})$ is moving right $(C)$ AND the speed $(X_{4})$ is higher right $(D)$ THEN the acceleration $(Y_{1})$ is positive $(E)$ .

IF the pole direction $(X_{1})$ is left $(F)$ AND the rotation direction $(X_{2})$ is higher left $(G)$ THEN the car $(X_{3})$ is moving left $(H)$ .

IF the car $(X_{3})$ is moving left $(H)$ AND the speed $(Y_{1})$ is higher left $(I)$ THEN the acceleration $(Y_{1})$ is negative $(J)$ .

According to the fuzzy production rules shown above, we get Figure 4.

Figure 4.

DPN of pole and rotation rules.

The fuzzy production rules for t-Test logic are shown as follows:

IF the number (P) is significant $(A)$ THEN the efficacy $(Y_{1})$ is high $(C)$ .

IF the number (P) is not significant $(B)$ THEN the efficacy $(Y_{1})$ is low $(D)$ .

According to the fuzzy production rules shown above, we get Figure 5.

Figure 5.

DPN of t-Test rules.

The fuzzy production rules for OFFC logic are shown as follows:

IF the pointer (P) is positive $(A)$ AND the eigenvalue $(X)$ is positive $(C)$ THEN the result (U) is negative $(E)$ .

IF the pointer (P) is negative $(B)$ AND the eigenvalue $(X)$ is positive $(C)$ THEN the result (U) is non-existent $(F)$ .

IF the pointer (P) is positive $(A)$ AND the eigenvalue $(X)$ is negative $(D)$ THEN the result (U) is non-existent $(G)$ .

IF the pointer (P) is negative $(B)$ AND the eigenvalue $(X)$ is negative $(D)$ THEN the result (U) is positive $(H)$ .

According to the fuzzy production rules shown above, we get Figure 6.

Figure 6.

DPN of OFFC rules.

Experimental results

In this sub-section, the results of running four benchmarks are presented to evaluate the performance of the unsupervised learning DPN and the supervised learning DPN. Those experimental results regarding the unsupervised learning and the supervised learning algorithms are shown in Table 1.

Table 1.

Performance comparison between unsupervised learning and supervised learning.

Learning methods (%)Benchmarks	Unsupervised learning				Supervised learning
	Learning accuracy (%)		Convergence time (seconds)		Learning accuracy (%)		Convergence time (seconds)
	DPN	DNN	DPN	DNN	DPN	DNN	DPN	DNN
Fuzzy car	94.0	93.1	18.3	18.1	94.3	93.4	17.3	18.0
Pole and rotation	92.6	92.4	18.9	19.2	93.5	93.2	18.2	18.7
t-Test	89.7	81.8	14.7	16.1	90.0	81.9	14.9	15.2
OFFC	88.6	90.5	20.5	20.7	91.8	91.7	15.2	17.7

DPN: deep Petri network; DNN: deep neural network; OFFC: output-feedback fuzzy controller.

According to the experimental results in Table 1, we assemble those values in Figures 7 and 8.

Figure 7.

Learning accuracy of four structures.

Figure 8.

Convergence speed of four structures.

In summary, the experimental results in Figures 7 and 8 indicate that the learning accuracies of the unsupervised and the supervised learning DPN are larger than those of the unsupervised and the supervised learning DNN. In addition, the convergence times of the unsupervised and the supervised learning DPN are less than those of the unsupervised and the supervised learning DNN. Therefore, DPN is better than DNN for four types of benchmarks, and it is certain that DPN can better perform the unsupervised and the supervised deep learning than DNN.

Functional comparison

Based on the above experimental results, we compare the differences between two models of DPN and DNN, and obtain the main results shown in Table 2.

Table 2.

Functional comparisons.

FunctionsModel	Data structure	Information storage	Applicability	Learning flexibility
DPN	Compact	Small	Broad	Better
DNN	Complex	Large	Limited	Worse

DPN: deep Petri network; DNN: deep neural network.

Conclusion

This article has successfully established the DPN model through Petri net theory, a new model for the unsupervised and supervised deep learning. The contributions of DPN are presented as follows:

The unfixed structure is different from the fixed fully connected structure of DNN. DPN is used to analyze the problem’s properties before establishing the structure of the problem.

Since each structure in DPN is designed for the problem itself, the nodes on the structure are only the necessary ones. So, the number of nodes in DPN is less than the ones in DNN for the same problem.

Because the number of nodes is smaller in DPN, the parameter adjustment is processed every time when the input data are finished, instead of making decision on the classification each time. Thus, the fewer number of nodes makes the overall convergence speed become faster.

However, the benchmarks used in this article are just uncomplicated logic. If one wants DPN to deal with other complicated types of data sets, such as image types or data streams, it will need to make more research efforts. It needs further research work regarding how to define an image or stream data types in fuzzy logic. If those types have complicated fuzzy relations, DPN will be further improved.

Footnotes

Acknowledgements

The authors are very grateful to the anonymous reviewers for their constructive comments which have improved the quality of this paper.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Ministry of Science and Technology, Taiwan, R.O.C., under grant MOST 107- 2221- E-845-001-MY3 and MOST 107-2221-E-845-002-MY3.

ORCID iD

Victor RL Shen

References

Shen

VRL

Chang

Y-S

Juang

TT-Y

. Supervised and unsupervised learning by using Petri nets. IEEE T Syst Man Cyber A: Syst Hum 2010; 40(2): 363–375.

Mahmud

Kaiser

Hussain

, et al. Applications of deep learning and reinforcement learning to biological data. IEEE T Neu Netw Learn Syst 2018; 29(6): 2063–2079.

Almiani

AbuGhazleh

Al Rahayfeh

, et al. Deep recurrent neural network for IoT intrusion detection system. Simul Model Prac Theor 2020; 101: 102031.

Liu

Zhao

Chen

, et al. A new machine learning method for identifying Alzheimer’s disease. Simul Model Prac Theor 2020; 99: 102023.

Shao

. Learning deep and wide: a spectral method for learning deep networks. IEEE T Neu Netw Learn Syst 2014; 25(12): 2303–2308.

Philip Chen

Zhang

C-Y

Chen

, et al. Fuzzy restricted Boltzmann machine for the enhancement of deep learning. IEEE T Fuzzy Syst 2015; 23(6): 2163–2173.

Ding

Zhou

. Modeling self-adaptive software systems with learning Petri nets. IEEE T Syst Man Cyber: Syst 2016; 46(4): 483–498.

Chang

C-H

. Deep and shallow architecture of multilayer neural networks. IEEE T Neu Netw Learn Syst 2015; 26(10): 2477–2486.

Mehdi Vahidipour

Meybodi

Esnaashari

. Learning automata-based adaptive Petri net and its application to priority assignment in queuing systems with unknown parameters. IEEE T Syst Man Cyber: Syst 2015; 45(10): 1373–1384.

10.

Gong

Liu

, et al. A multiobjective sparse feature learning model for deep neural networks. IEEE T Neu Netw Learn Syst 2015; 26(12): 3263–3277.

11.

Zhang

Yang

Chen

. Deep computation model for unsupervised feature learning on big data. IEEE T Ser Comput 2016; 9(1): 161–171.

12.

Zhang

Zheng

Cui

, et al. A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE T Multimedia 2016; 18(12): 2528–2536.

13.

Pfeiffer

Shukla

Turchetta

, et al. Reinforced imitation: sample efficient deep reinforcement learning for mapless navigation by leveraging prior demonstrations. IEEE Robot Autom Lett 2018; 3(4): 4423–4430.

14.

Shen

VRL

. Knowledge representation using high-level fuzzy Petri nets. IEEE T Syst Man Cyber A: Syst Hum 2006; 36(6): 2120–2127.

15.

Dimitriou

Leontaris

Vafeiddis

, et al. A deep learning framework for simulation and defect prediction applied in microelectronics. Simul Model Pract Theor 2020; 100: 102063.

16.

Duan

J-C

Chung

F-L

. Cascaded fuzzy neural network model based on syllogistic fuzzy reasoning. IEEE T Fuzzy Syst 2001; 9(2): 293–306.

17.

Juang

C-F

K-C

. A recurrent fuzzy network for fuzzy temporal sequence processing and gesture recognition. IEEE T Syst Man Cyber B: Cyber 2005; 35(4): 646–658.

18.

Ying

. Deriving analytical input-output relationship for fuzzy controllers using arbitrary input fuzzy sets and Zadeh fuzzy and operator. IEEE T Fuzzy Syst 2006; 14(5): 654–662.

19.

Zhou

Ying

. A method for deriving the analytical structure of a broad class of typical interval type-2 Mamdani fuzzy controllers. IEEE T Fuzzy Syst 2013; 21(3): 447–458.

20.

Deng Wand Qiu

. Supervisory control of fuzzy discrete-event systems for simulation equivalence. IEEE T Fuzzy Syst 2015; 23(1): 178–192.

21.

Zhang

Ishibuchi

Wang

. Deep Takagi-Sugeno-Kang fuzzy classifier with shared linguistic fuzzy rules. IEEE T Fuzzy Syst 2018; 26(3): 1535–1549.

22.

Hagras

. Comments on dynamical optimal training for interval type-2 fuzzy neural network (T2FNN). IEEE T Syst Man Cyber B: Cyber 2006; 36(5): 1206–1209.

23.

Juang

C-F

Chiu

S-H

Chang

S-W

. A self-organizing ts-type fuzzy network with support vector learning and its application to classification problems. IEEE T Fuzzy Syst 2007; 15(5): 998–1008.

24.

Juang

C-F

Tsao

Y-W

. A self-evolving interval type-2 fuzzy neural network with online structure and parameter learning. IEEE T Fuzzy Syst 2008; 16: 1411–1424.

25.

Wai

R-J

Yang

Z-W

. Adaptive fuzzy neural network control design via a T-S fuzzy model for a robot manipulator including actuator dynamics. IEEE T Syst Man Cyber B: Cyber 2008; 38(5): 1326–1346.

26.

Juang

C-F

Huang

R-B

Lin

Y-Y

. A recurrent self-evolving interval type-2 fuzzy neural network for dynamic system processing. IEEE T Fuzzy Syst 2009; 17(5): 1092–1105.

27.

Kim

S-S

Kwak

K-C

. Development of quantum-based adaptive neuro-fuzzy networks. IEEE T Syst Man Cyber B: Cyber 2010; 40(1): 91–100.

28.

Ebadzadeh

Salimi-Badr

. IC-FNN: a novel fuzzy neural network with interpretable, intuitive, and correlated-contours fuzzy rules for function approximation. IEEE T Fuzzy Syst 2018; 26(3): 1288–1302.

Deep Petri nets of unsupervised and supervised learning

Abstract

Keywords

Introduction

Literature review

Deep learning

NN categories

DNN

RNN

CNN

DA

HLFPN

Unsupervised learning

Supervised learning

Unsupervised deep learning algorithm

Numerical analysis

Supervised deep learning algorithm

Numerical analysis

Main results

Research tools

Benchmarks

Experimental results

Functional comparison

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iD

References