Sage Journals: Discover world-class research

Abstract

Reconfigurable security protocols, with dynamic protocol configuration and flexible resource allocation, have become a state-of-the-art technology to guarantee the security of space-ground integrated network. However, reconfiguration decision-making for reconfigurable security protocols remains a major challenge in order to adapt to diverse secure service requirements and deploy higher security level but more complicated security strategies in nodes with limited resources and computing abilities. To handle this problem commendably, a hierarchically collaborative ant colony–based reconfiguration decision-making model called HiCoACR is proposed. This model, inspired by the ideas of hierarchical reinforcement learning and population collaboration, decomposes the reconfiguration decision-making problem into two sub-problems by introducing a two-level hierarchy ant colony consisting of the Explorer and the Worker. The Explorer controls directions of protocol reconfiguration and generates abstract scheduling sub-goals which are conveyed from the Worker. While the Worker schedules most suitable cryptogram resources for each sub-goal received and produces the optimal reconfiguration solution which is verified and re-optimized by a Lévy process–based stochastic gradient descent algorithm. Both the Explorer and the Worker adopt a modified version of ant colony algorithm to fulfill its targets, where a hierarchical pheromone is defined to reinforce positive behaviors of each ant colony. Experiment results suggest that HiCoACR outperforms baseline algorithms and possesses well model transferability.

Keywords

Reconfiguration decision-making cryptogram resources scheduling and optimization hierarchically collaborative ant colony reconfigurable security protocol space-ground integrated network

Introduction

Security protocol is a primary means for guaranteeing the security of network services, such as network communication, data transmission, and authentication. However, no one security protocol could sufficiently cater to all the diverse security requirements, especially in space-ground integrated network (SGIN), where severely limited resource further decreases the possibility of fulfilling diverse security requirements and deploying higher security level but more complicated security strategies. Reconfigurable secure protocol (RSP), with adaptively dynamic protocol reconfiguration and flexible cryptogram resource configuration,^1,2 offers a great possibility of alleviating resources constrains, improving resources utilization and enhancing network security in SGIN. Security protocol reconfiguration (SPR) generates protocols by determining proper protocol flow and its components, picking cryptogram resources that satisfy the functional requirements and performance indexes for each component, and then assembling these selected resources in terms of the protocol flow. Since the performance of all components determines the efficiency of the generated target protocol, one of the key issues in SPR is to determine the corresponding protocol reconfiguration flow and its components and schedule optimal cryptogram resources so as to form the target security protocol, which is defined as reconfiguration decision-making problem (RDMP) in this article.

Generally, the scales of cryptogram resources and the diversities in design standards, cryptosystem, and application situations of cryptogram resources enlarge the solution space of RDMP. What is worse, the uncertainty of protocol flow caused by reconfiguration granularity, further intensify this situation. In addition, RDMP is a combinatorial optimization problem, a nondeterministic (NP)-hard problem, where the optimal solution relies on both reconfiguration protocol flow and cryptogram resources. However, previous works mainly focus on resources scheduling and ignore the role of reconfiguration flow in RDMP. Thus, designing an accurate and efficient reconfiguration decision-making mechanism is of great significance to RSP.

To address this problem, a hierarchically collaborative ant colony-based reconfiguration decision-making model called HiCoACR is proposed. The HiCoACR model takes inspiration from hierarchical reinforcement learning^3,4 and collaborative behaviors of biological populations,⁵ where the RDMP is decomposed into two sub-problems including sub-goal generation for resource scheduling and resource scheduling for sub-goals. And a hierarchical coordinated ant colony consists of the Explorer and the Worker is introduced to address these two sub-problems, respectively. In details, the Explorer controls directions of protocol reconfiguration and explores abstract resources scheduling sub-goals at lower temporal resolution in a latent state space, while the Worker schedules proper cryptogram resources for the sub-goals deriving from the Explorer and generates optimal solution for the target protocol. Both the Explorer and the Worker operate with an improved ant colony algorithm and a hierarchical pheromone is defined to reinforce the positive behaviors of the Explorer and the Worker. In addition, a Lévy theory⁶–based stochastic gradient algorithm is adopted to verify and re-optimize the solution produced by the Explorer and the Worker so as to generate the best reconfiguration solution.

The key contributions of this article include (1) a reconfiguration decision-making model HiCoACR for reconfigurable security protocol that decouples RDMP into scheduling sub-goal generation and resources scheduling for sub-goals; (2) a hierarchical pheromone that indicates the reconfiguration policies of HiCoACR and its updating mechanism; (3) a Lévy flight–based stochastic gradient algorithm to verify and re-optimize the optimal solution.

Related works

Naturally speaking, the core of RDMP lies in optimization scheduling, which can be classified into static scheduling and dynamic scheduling. Conventional methods for static scheduling include genetic algorithm,^7–9 particle swarm,^10,11 heuristic search,^12,13 and so on. Dynamic scheduling algorithms include load balancing,^14–16 fuzzy stochastic optimization,^10,17 deep reinforcement learning,^18,19 and so on. A heuristic search algorithm is adopted to find the optimal reconfiguration scheme for the target security protocol in given solution space.²⁰ A multi-objective evolutionary algorithm (MOEA)–based reconfigurable optimization algorithm is introduced to address energy-saving issues in reconfigurable sensor network, where a specific MOEA framework is applied to make a reasonable trade-off choice from the set of Pareto-optimal solutions according to their preferences and system requirements.²¹ A quality of service (QoS)-aware service composition algorithm is proposed to handle dynamic web service composition problem by taking the advantages of both global optimization and local optimization. In this article, mixed integer programming is adopted to find the optimal decomposition of global QoS constraints into local constraints and distributed local selection is introduced to find the best web services that satisfy these local constraints.²² A cuckoo’s breeding behavior–based heuristic method and a natural gene evolution–based heuristic method are proposed to find the best solution or near best solution of service composition, where the former applies a 1-OPT (algorithm that could reaches the solution by changing a single element in the current solution) heuristic to expand the search space in a controlled way and the later uses two memory structures to avoid the stagnation in a local optimum solution, and to ensure that exploitation and exploration are properly performed.²³ A heuristic algorithm T-HEU (a trust-based heuristic algorithm) combining trust-based selection, convex hulls, and global optimization is proposed to cope with the service composition problem. In T-HEU, the trust-based selection method is used to filter untrustworthy component services, the convex hulls to reduce the search space in the process of service composition, and the heuristic global optimization approach to obtain the near-optimal solution.²⁴ A systematic approach based on a fuzzy linguistic preference model and an evolutionary algorithm is put forward to solve the service level agreement (SLA)-constrained service composition problems. Specifically, the weighted Tchebycheff distance is introduced first to model this problem, and a fuzzy preference model for preference representation and weight assignment is presented, two evolutionary algorithms proposed for service composition.²⁵

Meanwhile, RDMP may also belong to the sequential decision tasks,²⁶ where multi-step problems have been solved and the consequences of a step may unfold gradually over many subsequent steps and choices. Multi-objective problems have been widely examined in many areas of decision-making.²⁷ In general, model-free reinforcement learning and model-based reinforcement learning are the main solutions. A FeUdal network, a novel architecture for hierarchical reinforcement learning, is proposed to solve sequential decision task such as playing computer games (such as Montezuma’s revenge), which provides great inspiration for method proposed in this article.³ All aforementioned works serve as inspiration for the computational methods developed in this article.

Formalization of RDMP

Definition 1. Reconfiguration model for SPR is defined as a quadruples $RSP = (RG, RR, RE, RX)$ , where $RG$ denotes reconfiguration goals, depicting functional and performance requirements of the target security protocol. $RR$ refers to as all available cryptogram resources including cryptocards, cryptographic devices developed on field-programmable gate arrays (FPGAs), or reconfigurable processors. $RE$ represents reconfiguration efficiency criteria for protocol performance evaluation. $RS$ indicates reconfiguration solutions of RDMP for the given protocol. One reconfiguration solution often consists of protocol flow and corresponding cryptogram resources.

Definition 2. RG denotes functional requirements and minimum performance indexes of the target protocol.

As complex protocols are formally derived from basic security components,²⁸ RG could be decomposed into a sub-goal set $RG = {r g_{1}, r g_{2}, \dots, r g_{n}}$ , where each sub-goal can be matched to a security component—a cryptogram resource. $\forall r g_{i} \in RG, 1 \leq i \leq n$ , there exits $req (r g_{i}) = (f c_{i}, p m_{i}, p f_{i})$ , where $f c_{i}, p m_{i}, and p f_{i}$ indicate functionality, interface, and performance required by sub-goal $r g_{i}$ , respectively. Intuitively speaking, there are multiple decomposition schemes for a given $RG$ due to differences in reconfiguration granularity. Thus, there may exist other sub-goal sets $RG' = {r g'_{1}, r g'_{2}, \dots, r g'_{m}}$ that satisfies $RG = {r g_{1}, r g_{2}, \dots, r g_{n}} = {r g'_{1}, r g'_{2}, \dots, r g'_{m}} = RG'$ .

Definition 3. Cryptography reconfigurable cell (CRC) are independent cryptogram components which are reconfigurable in functional, structural, and temporal dimensions.

For each CRC, its attribute set is defined as a quintet $CRC = (id, tp, fc, pf, pm)$ , where $id$ uniquely identifies a CRC, $tp$ points out the type of CRC, such as atom CRC and compound CRC, $fc$ depicts the functionality of CRC, $pf$ describes the performance of CRC, including execution time, energy consumption, safety level, and so on, and $pm$ denotes its input and output interfaces. All CRCs together constitute the total cryptogram resources $RR = {cr c_{i} | i = 1, 2, \dots, N}$ .

Definition 4. RE refers to overall performance of cryptogram resource or a security protocol. Efficiency of security protocol $SP$ is recorded as $RE (SP)$ , and efficiency of a cryptogram resource $cr c_{i}$ is recorded as $RE (cr c_{i})$ . Efficiency of a protocol is usually evaluated using certain efficiency model, which is described in section “RE evaluation.”

Definition 5. Reconfiguration solution (RS) depicts the result of reconfiguration decision-making for SPR, which consist of a protocol flow and a set of cryptogram resources constituting the flow.

Based on Petri Net model,²⁹ RS could be formulated as $RS = (State, Res; Flow)$ . Where $Flow : State \leftrightarrow Res$ denotes the flow of a target protocol, transition set $Res = {c_{1}, c_{2}, \dots, c_{n}}$ are components of the target protocol satisfying $Res \subset RR$ , and all resources in $Res$ together form the target protocol according to the $Flow$ . Place $State$ refers to the system state set. As there exist more than one RS satisfying RG, solution with highest RE is regarded as the best RS and recorded as $R S_{best}$ .

Problem 1. RDMP for SPR is defined as follows: given reconfiguration resource $RR = {cr c_{i} | i = 1, 2, \dots, N}$ , a target security protocol $rsp$ , and corresponding reconfiguration goal $RG (rsp)$ , RDMP is to find the optimal solution $R S_{best} = (\hat{S}, \hat{T}; \hat{F})$ that satisfies the following conditions:

$\hat{T} = {t_{1}, t_{2}, \dots, t_{m}} \subset RR$ , and $⋃_{t \in \hat{T}} t \overset{\hat{F}}{\to} rsp$ , where $m$ is the size of $\hat{T}$ and $⋃_{t \in \hat{T}} t \overset{\hat{F}}{\to} rsp$ indicates that all cryptogram resources in $\hat{T}$ together form the target protocol $rsp$ according to the protocol flow $\hat{F}$ .

There exist a sub-goal set ${r g_{1}, r g_{2}, \dots, r g_{m}} = RG$ and a one-to-one mapping $f : T \to RG$ , where $\forall r g_{j} \in RG$ , there is one and only one $t_{i} \in \hat{T} (i = 1, 2, \dots, n)$ satisfying equations $fc (t_{i}) = fc (r t_{i}), pm (t_{i}) = pm (r t_{i}), pf (t_{i}) ≻ pf (r t_{i})$ , where $x ≻ y$ means that “ $x$ is better than $y$ ,” $fc (x), pm (x), pf (x)$ denotes functionality, interfaces, and performance of $x$ , respectively.

$pf (R S_{best}) ≻ pf (rsp)$ , which means that the overall performance of $R S_{best}$ is higher than the overall performance requirements of the target protocol $rsp$ .

$\forall RS = (S, T; F)$ , $RE (R S_{best}) ≻ RE (RS)$ .

According to the above definitions, RDMP is to find the best solution with highest RE under the condition of meeting basic functional requirements and performance index. Figure 1 briefly illustrates the RDMP, where dotted lines with different colors represent different reconfiguration solutions and the green dotted line is the best solution.

Figure 1.

Reconfiguration decision-making procedure of security protocol.

The HiCoACR model

HiCoACR is a two-level hierarchy architecture consisting of two modules—the Explorer and the Worker, each of which implements certain sub-tasks of reconfiguration decision-making. The Explorer points out the reconfiguration directions of protocol flow and generates scheduling sub-goal, while the Worker implements the cryptogram resources scheduling sub-tasks, where both together generate an optimal solution of RDMP coordinately.

In details, at time $t$ , the Explorer apperceives a latent system state $s_{t + 1}$ and generates a cryptogram resources scheduling sub-goal $r g_{t + 1} (0 \leq t \leq K - 1)$ , where $K$ is the scale of sub-goals. The sub-goal $r g_{t + 1}$ points out the reconfiguration directions and all ${r g_{t + 1} (0 \leq t \leq K - 1)}$ together cover the overall protocol RG, which also indicate the components of the selected protocol flow. The Worker follows the directions guided by sub-goal $r g_{t + 1}$ and schedules optimal resource $c'_{t + 1}$ for $r g_{t + 1}$ according to $s_{t + 1}$ . Then all components ${c'_{t + 1} (0 \leq t \leq K - 1)}$ are combined to produce the candidate optimal solution. And a stochastic gradient algorithm based on Lévy flight is conducted to verify and optimize the candidate solution so as to get the final optimal solution.

Both the sub-goal generation process in the Explorer and the resource scheduling process for sub-goal in the Worker are trained by a modified version of ant colony algorithm. Meanwhile, a hierarchical pheromone $phe$ is introduced to reinforce the positive behaviors of the Explorer and the Worker during training and to produce reconfiguration policy during reconfiguration decision-making. In essence, the greater the RE of a RS is, the higher the concentration of $phe$ is and the policies that the related protocol flow and resources are to be selected with higher probabilities will form.

The overall design of HiCoACR model is illustrated in Figure 2 and equations (1)–(6) depict the forward dynamics of HiCoACR model

s_{t + 1}, r g_{t + 1}, h_{t + 1}^{E} = f^{Eant} (RG, s_{t}, h_{t}^{E})

(1)

c_{t + 1}, h_{t + 1}^{W} = f_{lant}^{W} (r g_{t + 1})

(2)

R S_{i} = ⋃ c_{t}; {RS}_{0}^{ca} = f_{Merge}^{W} (⋃ R S_{i})

(3)

{RS}_{n}^{ca}, h_{t + 1}^{Wg} = f_{gant}^{W} ({RS}_{0}^{ca}, h_{t}^{Wg})

(4)

R S_{best} = \arg max (RE (r s_{lb}), RE (R S_{best}))

(5)

phe = f^{phe} (phe, \nabla RE); π_{t} = f^{trans} (phe, s_{t})

(6)

where $h_{t}^{E}, h_{t + 1}^{E},$ $h_{t}^{W}, h_{t + 1}^{W}, h_{t}^{Wg}, h_{t + 1}^{Wg}$ are intermediate variables, $i$ in equation (3) indicates the ith epoch of reconfiguration solution exploring, and $n$ in equation (4) is times of solution updating. Function $f^{Eant}$ in equation (1) aims to generate the next sub-goal with current system state and RG, which is detailed in section “Sub-goal generation.” Function $f_{lant}^{W}$ in equation (2) obtains the best-matched cryptogram resource for the sub-goal generated in equation (1), details of which are illustrated in the first part in section “Cryptogram resources scheduling.”Equation (3) combines all the resources in one epoch to get a feasible solution and function $f_{Merge}^{W}$ computes the candidate solution by merging feasible solutions got in different epochs, which refers to the second part in section “Cryptogram resources scheduling.”Equation (4) verifies and optimizes the candidate solution with function $f_{gant}^{W}$ and equation (5) chooses the final best solution between the optimized candidate solution and historical best solution. The details of function $f_{gant}^{W}$ are given the third part in section “Cryptogram resources scheduling.”Equation (6) details the rules of hierarchical pheromone updating and policy generating, which are described in section “Hierarchical pheromone updating.”

Figure 2.

Reconfiguration decision-making model based on hierarchically collaborative ant colony algorithm.

Sub-goal generation

The intention of the Explorer is to produce a sequence of sub-goals guiding the directions of reconfiguration. Each sub-goal represents a component of the protocol flow. As mentioned above, there may be multiple flows corresponding to the target protocol due to difference in reconfiguration granularity. Taking protocol flows in Figure 3, for instance, there are seven distinct flows between state $s_{i - 1}$ and state $s_{i + 3}$ . At state $s_{i}$ , there are three alternative sub-goals to construct the protocol. The Explorer selects one and generates the sub-goal $r g_{i}$ . When the sub-goal $r g_{i}$ is accomplished by the Worker, state transition from $s_{i}$ to $s_{i + 1}$ is triggered.

Figure 3.

Protocol flows and state transition.

Given certain security protocol, its RG could be expressed as a goal vector $RG \in R^{k}$ . And for each cryptogram resource $c \in RR$ , it will satisfy the requirements of one sub-goal. Thus, the Explorer first produces a goal vector for each cryptogram resource. All sub-goal vectors of the whole cryptogram resources could form a matrix.

To generate proper sub-goal, the Explorer adopts an ant colony algorithm with Q-learning, which combines the advantage of optimization theory, non-linear control, and reinforce learning. The Explorer takes current system state $s_{i}$ and $RG$ as input and outputs the sub-goal. In this algorithm, when producing sub-goals, an adaptive pseudorandom ratio selection rule is introduced at probability $θ_{0}$ , and the sub-goal generation policy indicated by flow pheromone is adopted at probability $1 - θ_{0}$ . What is more, sub-goal generation policies are trained by updating flow pheromone, which is detailed in section “Hierarchical pheromone updating.” At each system state, the Explorer generates a sub-goal according to policies given in equations (7) and (8)

f^{Eant} (s_{t}) = {\begin{matrix} \arg max ({[τ_{ik} (t)]}^{α 1} \cdot {[η_{ik} (t)]}^{β 1}); θ \leq θ_{0} \\ p_{ij}^{k} (t); θ > θ_{0} \end{matrix}

(7)

p_{ij}^{k} (t) = {\begin{matrix} \frac{{[τ_{ij} (t)]}^{α 1} \cdot {[η_{ik} (t)]}^{β 1}}{\sum_{s \subset allowe d_{i}} {[τ_{is} (t)]}^{α 1} \cdot {[η_{is} (t)]}^{β 1}}, k \subset allowe d_{i} \\ 0, k ⊄ allowe d_{i} \end{matrix}

(8)

where $τ_{ij} (t)$ is the amount of pheromone deposited for the sub-goal that triggers state transition $ij$ , and $α 1 \geq 0$ is the heuristic factor reflecting the relative importance of pheromones $τ_{ij} (t)$ in guiding sub-goal generation process. And $η_{ik} (t) = 1 / RE (re s_{ik})$ is heuristic function indicating the expectation that the sub-goal triggering state transition $ij$ is generated, and $β 1 \geq 1$ is the expectational heuristic factor reflecting the relative importance of heuristic factor $α 1$ in guiding sub-goal generation process. And $allowe d_{i}$ denotes possible states set that state $i$ may transfer to. Sub-goal generation is implemented repeatedly until the target protocol ends.

Cryptogram resources scheduling

Once receiving a sub-goal conveyed from the Explorer, the Worker first computes best cryptogram resource matching the sub-goal. And then, cryptogram resources for all sub-goals together constitute the candidate optimal solution for RDMP. Finally, candidate optimal solution is verified and optimized to obtain the final optimal solution.

Cryptogram resource matching for sub-goal

Scenario of cryptogram resource matching for a sub-goal is depicted in Figure 4, where given $\forall r g_{i} (i = 1, 2, \dots, t)$ , there may exist a cryptogram resources set $R R_{i} \in RR$ that could satisfy the functional requirements and performance indexes of the sub-goal $r g_{i}$ . Cryptogram resource matching aims at finding the best-suitable cryptogram resource for each received sub-goal.

Figure 4.

Scenario of cryptogram resource matching for sub-goals.

Similarly, to address this issue, an ant colony algorithm that includes just once cryptogram resource matching is introduced. Each matching process aims at finding the preponderant cryptogram resource for current sub-goal. Rules for cryptogram resource matching are depicted in equations (9) and (10)

f_{lant}^{W} (r g_{t}) = {\begin{matrix} \arg max ({[τ_{c_{i}} (t)]}^{α 2} \cdot {[η_{c_{i}} (t)]}^{β 2}); θ \leq δ \\ p_{c_{i}} (t); θ > δ \end{matrix}

(9)

p_{c_{i}} (t) = {\begin{matrix} \frac{{[τ_{c_{i}} (t)]}^{α 2} \cdot {[η_{c_{i}} (t)]}^{β 2}}{\sum_{c_{j} \subset MT (r g_{i})} {[τ_{c_{j}} (t)]}^{α 2} \cdot {[η_{c_{j}} (t)]}^{β 2}}, c_{i} \subset MT (r g_{i}) \\ 0, c_{i} ⊄ MT (r g_{i}) \end{matrix}

(10)

where $δ \in [0, 1]$ and a random walk rule depicted in equation (9) is adopted when $θ \leq δ$ ; otherwise, a determinate matching policy shown in equation (10) is adopted. $τ_{c_{i}} (t)$ is the amount of pheromone for cryptogram resource $c_{i}$ and $η_{c_{i}} (t)$ is heuristic function indicating the expectation that $c_{i}$ matches $r g_{i}$ . $MT (r g_{i})$ is resources set that matching sub-goal $r g_{i}$ . $c_{i}$ is set to be the ith component of the solution after each matching process. And $α 2 \geq 0$ is the heuristic factor reflecting the relative importance of accumulated pheromones $τ_{c_{i}} (t)$ in guiding resources matching process, $β 2 \geq 1$ is the expectational heuristic factor reflecting the relative importance of heuristic factor $α 2$ . All components in an epoch constitute a solution for RDMP.

Candidate solution producing

Candidate solution producing adopts the idea of concentrating group wisdom. Once all epochs are accomplished, to incorporate preponderant components of each solution, all generated solution are embedded into preponderant component extracting vectors using a linear projection $ϕ$ . Then, all vectors are pooled by summation to produce the candidate solution for the RDMP. Assuming that ${RS}_{0}^{ca}$ is the candidate optimal solution and $R S_{i}$ denotes the ith solution, then there exists

{RS}_{0}^{ca} = \sum_{i = 1}^{K} ϕ (R S_{i})

(11)

where $K$ denotes the scale of all epochs and there exists $RE ({RS}_{0}^{ca}) ≻ RE (R S_{i})$ .

Solution verification and optimization

In case that the candidate solution is a relative maximum one rather than the absolute one, an optimal solution verification and optimization process is indispensable to guarantee that the real optimal solution is obtained. To address this issue, a Lévy flight⁶–based stochastic gradient algorithm is employed. The main idea is to introduce the advantages of frequent short-distance steps and accidental long-distance steps in Lévy flight into stochastic gradient policies.

Assume ${RS}_{0}^{ca}$ to be the initial candidate solution and $X_{i}^{ca}$ to be the ith updating of ${RS}_{0}^{ca}$ , the solution updating rule is formulized as follows

{RS}_{i + 1}^{ca} = (RE ({RS}_{i}^{ca}) - RE (R S_{lbest})) \oplus σ \oplus Levy (ξ) + {RS}_{i}^{ca}

(12)

where $σ$ is step factor controlling the range of optimization and $σ = 1$ can be used usually in most cases, $R S_{lbest}$ denotes the historically best candidate solution, and $L é vy (ξ)$ provides the random step length from a Lévy distribution

L é vy (ξ) ~ μ = t^{- 1 - ξ}, 0 < ξ \leq 2

(13)

For ease of calculation, equation (14) is adopted to calculate the Lévy random number

L é vy (ξ) ~ \frac{ϕ \times μ}{{| ν |}^{1 / ξ}}

(14)

where $μ, ν$ follow the normal distribution and $ξ = 1.5$

ϕ = {(\frac{Γ (1 + ξ) \times \sin (π \times ξ / 2)}{Γ (\frac{1 + ξ}{2}) \times ξ \times 2^{(ξ - 1) / 2}})}^{1 / ξ}

(15)

To speed the convergence process of optimal solution optimization, some solutions are discarded and substituted by new solutions with certain probability. These new solutions are generated based on random walk policy as follows

{RS}_{i + 1}^{ca} = {RS}_{i}^{ca} + γ_{0} ({RS}_{i}^{ca} - {RS}_{k}^{ca})

(16)

where ${RS}_{k}^{ca}$ is a discarded solution and $γ_{0}$ is the step factor.

After the aforementioned solution optimization, assume ${RS}_{n}^{ca}$ to be the optimized solution, and then the final optimal solution $R S_{best}$ can be obtained with equation (17)

R S_{best} = \arg max (RE (R S_{best}), RE ({RS}_{n}^{ca}))

(17)

Hierarchical pheromone updating

In HiCoACR, besides the proposed hierarchical coordinated ant colony algorithm, a hierarchical pheromone is defined for reinforcing the positive feedback of the Explorer and the Worker and for sharing wisdom among populations as well. In addition, hierarchical pheromone points out the reconfiguration policies of RDMP. The hierarchical pheromone is defined as follows

phe = {ph e^{f}, ph e^{r}}

(18)

where the upper pheromone $ph e^{f} = {{phe}_{ij}^{f} | i, j \in N}$ denotes pheromone on sub-goals and ${phe}_{ij}^{f}$ is the amount of pheromone on the sub-goal triggering state transition $ij$ , which is recorded as $τ_{ij} (t)$ in equations (7) and (8). However, the lower pheromone $ph e^{r} = {{phe}_{i}^{r} | i \in N, 0 < i < Num}$ refers to as pheromone for cryptogram resources and ${phe}_{i}^{r}$ is the amount of pheromone on the ith cryptogram resource, which is recorded as $τ_{c_{i}} (t)$ in equations (9) and (10).

Once sub-goal generating or cryptogram resource matching is finished, a pheromone updating process is triggered. Considering the difference of ant colony algorithms used in the Explorer and the Worker, different pheromone updating rules are adopted.

Pheromone updating for sub-goals

Pheromone updating for sub-goals occurs mainly in the sub-goal generation phase. As it is mentioned in section “Sub-goal generation,” sub-goal generation adopts the ant colony algorithm with Q-learning. Once the sub-goal is generated, the pheromone of this sub-goal is updated according to the following rule

\begin{matrix} {phe}_{ij}^{f} (t + n) = (1 - ρ_{f}) {phe}_{ij}^{f} (t) \\ + ρ_{f} (Δ {phe}_{ij}^{f} + γ_{1} \cdot max_{k \in allowe d_{j}} {phe}_{ik}^{f} (t)) \end{matrix}

(19)

Δ {phe}_{ij}^{f} = \frac{RE (X_{best}^{t}) - RE (X_{best}^{t + n})}{RE (X_{best}^{t + n})}

(20)

where $ρ_{f} \in (0, 1)$ denotes the volatilization factor of protocol flow pheromone, whose value directly affects the direction controlling ability of the Explorer and convergence speed of the ant colony. $Δ {phe}_{ij}^{f}$ depicts increment of sub-goal pheromone triggering state transition $ij$ . And, $γ_{1}$ is the discount factor for computing the accumulated reward. Finally, after the model is trained, the sub-goals pheromone may form a hypergraph and could be transferred into sub-goal generation policies using equation (8).

Updating of pheromone for cryptogram resources

Pheromone updating for cryptogram resources is triggered in two occasions, namely the cryptogram resource matching phase and the solution verification and optimization phase. During the cryptogram resource matching phase, when the best-suitable cryptogram resource for each received sub-goal is found, pheromone for this best-suitable cryptogram resource will be updated. Meanwhile, when the optimal solution is generated in solution verification and optimization phase, pheromones for all cryptogram resources constituting the optimal solution will be updated. The updating rule can be formulized as follows

{phe}_{c}^{r} (t + n) = (1 - ρ_{r}) {phe}_{c}^{r} (t) + Δ {phe}_{c}^{r}

(21)

where $ρ_{r} \in (0, 1)$ refers to as the volatilization factor of resources pheromone, which directly affects the ratio of choosing resources as best-suitable resources that have been selected before; $Δ {phe}_{c}^{r}$ depicts the increment of pheromone on cryptogram resources $c$ . In cryptogram resource matching phase, $Δ {phe}_{c}^{r} = W / RE (c)$ . While in solution verification and optimization phase, $Δ {phe}_{c}^{r} = W / RE (X_{lbest})$ , where $X_{lbest}$ is local optimal solution in an epoch and $W$ is a constant variable. Once the hierarchical pheromone is trained, the hierarchical pheromone could be transferred into resources scheduling policies.

Model details

RE evaluation

RE evaluation is the fundamental of model training in reconfiguration decision-making, which evaluate the efficiency of both cryptogram resources and security protocol. For ease of RE evaluation, evaluation indexes used in this article include E-time, security level, resources consumption, energy consumption, and so on. E-time refers to as the time consumption that a protocol or cryptogram resource fulfills certain functionality. Security level indicates the security strength of a protocol or cryptogram component. Meanwhile, resources consumption represents the resources used by a protocol, such as memory and central processing units (CPUs). Thus, with these main indexes, a RE vector $\vec{re}$ can be constructed according to Definition 4

\vec{re} = (time, \sec, cpu, memory, energy)

(22)

where $time, \sec, cpu, memory, and energy$ refer to as E-time, security level, CPU consumption, memory consumption, and energy consumption, respectively. Analysis shows that these indexes can be classified into two categories, namely efficiency gain indexes and efficiency loss indexes. For instance, security level belongs to the efficiency gain indexes, where the higher the security level, the higher the RE is. While E-time and resources consumption belong to the efficiency loss indexes, where the lower the E-time and resources consumption, the higher the RE is. The goal of reconfiguration decision-making is to decrease efficiency loss and increase efficiency gain so as to find the solution with highest RE.

Without loss of generality, a linear model is adopted as efficiency evaluation model to computer the RE of a security protocol or a cryptogram resource

RE (x) = \vec{re (x)} \otimes {(\vec{ω})}^{T}

(23)

where $RE (x)$ denotes RE of $x$ , $\vec{ω}$ indicates the coefficient vector of efficiency indexes, and $\vec{re (x)}$ refers to as efficiency vector of $x$ . The efficiency evaluation model could be extended or modified to adapt to different practical application scenarios. Incentive function $\tanh (RE (x))$ could also be introduced to highlight efficiency difference of two solutions.

However, it is worth noting that considering the features of efficiency indexes, components of $\vec{re}$ cannot be calculated by simply doing weighted summation while computing the RE of a security protocol. Instead, each component of $\vec{re}$ is calculated with certain rules. In detail, when calculating value of $time$ in $\vec{re}$ , it is equivalent to the shortest time problem in an activity on edge (AOE) network. When calculating value of $cpu, memory, energy$ in $\vec{re}$ , the weighted summation is effective and to the point. Meanwhile, while calculating $\sec$ , the Short board effect should be the calculation basis.

Reconfiguration decision-making algorithm based on HiCoACR

Once HiCoACR is trained, the reconfiguration decision-making can be done to address the RDMP with HiCoACR. The reconfiguration decision-making algorithm takes the cryptogram resources and trained hierarchical pheromone as input, and output the optimal resources set and corresponding protocol flow for the target protocol.

In details, the algorithm begins at the initial system state $s_{0}$ . Then the Explorer generates the sub-goal according to sub-goal generation policies transferred from the sub-goal pheromone and the Worker schedules the best cryptogram resource for this sub-goal with scheduling policies derived from cryptogram resources pheromone. The Explorer and the Worker keep doing these steps until the system reaches end state $s_{end}$ , where all components are scheduled and could constitute the whole protocol. The algorithm procedure is depicted in Algorithm 1.

Algorithm 1. Reconfiguration decision-making algorithm.
Require: Cryptogram resources $RR$ , hierarchical pheromone $phe$
Output: Optimal resource $RS$
1: $s_{current} \leftarrow s_{0}; RS \leftarrow \emptyset$ .
2: $π_{flow} \leftarrow PolicyCalculation (ph e^{f})$ ; $π_{Res} \leftarrow PolicyCalculation (ph e^{r})$
3: while $s_{current} \neq s_{end}$
4: $r g_{current} \leftarrow SubgoalGeneration (s_{current}, π_{flow})$
5: $c \leftarrow ResourceMatching (r g_{current}, RR, π_{Res})$
6: $RS \leftarrow RS + c$
7: $s_{current} \leftarrow s_{current} + r g_{current}$
8: end while
9: output ( $RS$ )

Where $PolicyCalculation$ calculates policies with the hierarchical pheromone through equations (8) and (10). More details of function $SubgoalGeneration$ and function $ResourceMatching$ are depicted in sections “Sub-goal generation” and “Cryptogram resource matching for sub-goal,” respectively. With this algorithm, an optimal solution for RDMP could be obtained finally.

Experiments and discussion

Experiments setup

To explore the properties of HiCoACR, experiments are carried out to analyze its convergence, performance, and transferability. The subjects in our experiments include a reconfigurable cryptogram resources set and a security protocols set. Table 1 lists a simplified reconfigurable cryptogram resources set, which includes a mass of atomic and compound cryptogram resources in diverse forms such as cryptocards, cryptographic devices developed on FPGAs or reconfigurable processors, and so on. The developing platforms, performances, and other attributes are left out in Table 1. And for any one resource list in Table 1, there may be multiple physical devices with different performances corresponding to it.

Table 1.

Simplified reconfigurable cryptogram resources.

Security service	Cryptogram algorithm and components
Confidentiality	S-DES, DES, 3-DES, IDEA, Blowfish, CAST-128, CAST-256, A5/1, A5/2, RC4, RC5, RC6, ElGamal, RSA, AES, ECC
Integrity	MD4, MD5, SHA-1, SHA-2, RIPEMO-160
Authentication	Hash, RSA, Schnorr, ElGamal, DSS, ECC
Anti-replay	Linear Congruence, Normal distribution, Monte
Non-repudiation	Hash, RSA, Schnorr, ElGamal, DSS, ECC

Security protocols adopted to test HiCoACR are listed in Table 2, which include authentication protocols, communication protocols, and key agreement protocols. Convergence and performance experiments are mainly tested on authentication protocols, but communication protocols and key agreement protocols are utilized to demonstrate the transferability of HiCoACR. Moreover, details of these security protocols can be derived from related references.

Table 2.

Security protocols.

Categories	Protocols
Authentication	Hsieh’s scheme,³⁰ Jiang’s scheme,³¹ Priauth,³² Roy-Chowdhury’s scheme,³³ and Chen’s scheme³⁴
Communication	Mahmoud’s scheme,³⁵ Khalil’s scheme,³⁶ and RiSeG³⁷
Key agreement	Tseng’s scheme³⁸ and Amin’s scheme³⁹

The simulation mainly includes three procedures. The first phase is data preparing, where data such as functionality, interface, and performance of all cryptogram resources and the composition of target protocols have to be provided. In fact, we extract these data of cryptogram resources and target protocols and organize them in JSON format instead of operating physical devices which simplifies the training process. Usually, resource files with different amounts of cryptogram resources are prepared to simulate different scales of cryptogram resources.

The second phase is model adjusting, where the inputs and parameters of our algorithm are adjusted to simulate different scenarios. For instance, the scale of cryptogram resources is set to different values by inputting different resource files while testing their influences on convergence ratio. While examining the influence of pheromone volatility factors on convergence ratio, parameter $ρ_{f}$ in equation (19) and $ρ_{r}$ in equation (21) are set to different values between 0 and 1. Parameters in simulations are initialized as follows. Set $α 1 = 2, β 1 = 2, θ_{0} = 0.2$ for equations (7) and (8). Set $α 2 = 1, β 2 = 4, δ = 0.3$ for equations (9) and (10). Set $σ = 1, ξ = 1.5, γ_{0} = 1.5$ for equations (12)–(15). Set $ρ_{f} = 0.4, ρ_{r} = 0.35, γ_{1} = 0.5$ , $W = 10$ for equations (19)–(21). Let the numbers of ants in the Explorer and the Worker be 50.

The third phase is model training, where the convergence, performance, and model transferability are analyzed. During this phase, the proposed model is executed and a best reconfiguration solution for the target protocol is generated. Meanwhile, the policies for reconfiguration decision-making are stored, which can be transferred to reconfiguration decision-making for other target protocols.

Convergence analysis under different parameters

Convergence analysis mainly focuses on the influences of cryptogram resources requirements of security protocol, total scale of cryptogram resources, the step length of global optimization, the pheromone volatility factor, and the population collaboration on the model convergence. And the stopping condition for training is that the deviation of two adjacent solutions is less than 0.01.

Resources requirements of protocol and total scale of cryptogram resources

The total size of cryptogram resources carried on nodes and the cryptogram resources requirements of security protocol directly affect the performance of the reconfiguration decision model. To verify the influences, authentication protocols listed in Table 2 are chosen as target security protocol to represent different scales of cryptogram resources requirements, and we range the total scale of cryptogram resource from 0 to 500.

The influences of cryptogram resources requirements or scale of cryptogram resources on model convergence are illustrated in Figure 5. The x-axis represents the total scale of cryptogram resource and the y-axis denotes the convergence time. From Figure 5, it can be drawn that when cryptogram resource requirement is fixed, the larger the size of the total cryptogram resources, the slower the convergence rate. The reason is that the larger the size of the total cryptogram resources is, the bigger the searching space of RS is, which results in slower convergence speed. Thus, larger scale of cryptogram resource leads to lower model convergence rate.

Figure 5.

Influence of resources requirements and resource size on convergence.

Meanwhile, when the value of x-axis is set to be fixed, the convergence time of Hsieh’s scheme, Priauth, Jiang’s scheme, Roy-Chowdhury’s scheme, and Chen’s scheme increase in turn, where the resources requirement increases in turn as well. That is to say that when the size of the total cryptogram resources is fixed, the greater size the resources requirement of protocol is, the slower the convergence rate is. Similarly, the reason is that the times of loops enlarge with the increase of resources requirement, which leads to slower convergence as well. Even when concurrency strategies are adopted to optimize the algorithm performance, the tendency will remain the same for the reason that the next loop depends on the pheromone updated in last loop.

Step type in solution verification and optimization phase

In solution verification and optimization of HiCoACR, a Lévy flight–based stochastic gradient decline algorithm is adopted. The convergence rate of this stochastic gradient decline algorithm greatly influences the convergence rate of HiCoACR, where the Lévy step plays a vital role. To analyze the influence of different step types on model convergence, the fixed step and the random step are taken as benchmarks. And $L é vy (ξ) = 1$ for the fixed step and $L é vy (ξ) = 2 \times rand (1)$ for the random step, where $rand$ is a random number generator.

From Figure 6, we can find that the Lévy flight step outperforms the random step and fixed step in convergence. After deep analysis, we find that the Lévy flight–based stochastic gradient decline algorithm adopts the Lévy flight step which combines long step and short step, and steps will be adjusted according to the distance to the optimal solution, resulting in faster convergence speed. In addition, compared to fixed step, random step possesses faster convergence because randomness of step length increases the probability to find the best reconfiguration solution.

Figure 6.

Influence of step type on convergence.

Volatility factor of pheromone

In HiCoACR, a hierarchical pheromone is defined to reinforce positive behaviors, which is adopted to imitate memory ability of ant colonies and is updated in each loop. Meanwhile, the pheromone volatility factor is used to simulate memory deterioration. To examine the effects of volatility factor, the total size of cryptogram resources is set to be 100 and choose Hsieh’s scheme as target security protocol. As both the flow pheromone and cryptogram resource pheromone have its own volatility factor, the influence of these two volatility factors is tested separately. Value of volatility factor for resources pheromone is set to be 0.08, 0.1, 0.12, 0.15, 0.2, 0.25, 0.3, and 0.35, while value of volatility factor for flow pheromone is set to be 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, and 0.4.

From Figure 7, we can find intuitively that larger volatility factor possesses less convergence time, which means the larger volatility factor, the faster the convergence speed. The reason is that the volatility factor denotes forgetting probability of history interactive information among populations. When a pheromone gets to 0, the sub-goal generation action or matching action is forgotten, which influences the ability of solution searching. When the volatility factor gets larger, the pheromones of unvisited cryptogram resources or protocol flow which are relatively less than that of frequently visited ones will drop to 0 much faster and the probability that the visited ones are selected again will get larger, leading to the speed-up of convergence process. However, larger volatility factor leads to less diversity of reconfiguration policies due to the rapid decline of pheromone, which decreases the randomness of sub-goal generation and cryptogram resources matching.

Figure 7.

Influence of volatility factor of pheromone on convergence.

In addition, when the volatility factor of protocol flow and cryptogram resources are set equal, the convergence time of the former is shorter than that of the latter. Because volatility factor of protocol flow not only affects the directions of reconfiguration, but also influences the resources pheromone. When the volatility factor of protocol pheromone gets larger, the probabilities that unvisited protocol flow will be selected is lower, which will lead to the pheromones of corresponding cryptogram resources in unvisited protocol flows lower than that of cryptogram resources in frequently visited protocol flows. This further increases the convergence speed. Thus, the influence of volatility factor of protocol flow pheromone is greater than that of resources pheromone. And either of these two volatility factors is set unsuitable, the collaboration relationship of the Explorer and the Worker will be broken.

Population collaboration

Population collaboration is one of the most vital mechanism to decrease the searching space of reconfiguration solutions in HiCoACR. To analyze its influence on model convergence, change the ant number in the Explorer and the Worker to adjust the degree of collaboration of the Explorer and the Worker. First, let the numbers of ants in the Worker to be 50, set numbers of ants in the Explorer to be $0, 10, 20, \dots, 100$ . Then let the numbers of ants in the Explorer to be 50, set numbers of ants in the Worker to be $0, 10, 20, \dots, 100$ . Meanwhile, set the total size of cryptogram resources to be 100 and choose Hsieh’s scheme as target security protocol.

From Figure 8, when the number of ants in the Explorer or in the Worker drops to 0, the convergence time gets longer. Meanwhile, when the number of ants reaches a certain value, the convergence time is tending toward stability. The reason is that when the ant number in the Explorer or the Worker equals 0, there exists only one population and interpopulation collaboration disappears, which results in that HiCoACR degenerates to Cuckoo search algorithm when number of ants in the Explorer is 0 or degenerates to ant-Q algorithm when number of ants in the Worker equals 0. And at that moment, the convergence time approximately reaches the ceiling. In addition, according to the experiments results, number of ants in the Explorer and the Worker is more suitable to be set to about 50 in terms of the given protocol samples.

Figure 8.

Influence of population collaboration on convergence.

Performance analysis

Performance analysis of HiCoACR model aims to compare model performance with certain benchmark algorithms. Set size of whole cryptogram resources to be 100, take ant colony algorithm,⁴⁰ Cuckoo search algorithm,⁴¹ and reinforcement learning⁴² as benchmark algorithm and set authentication protocol in Table 2 as target security protocol.

From Table 3, it could be seen that HiCoACR model possesses less time cost and much higher accuracy, which indicates that HiCoACR outperforms given benchmark algorithms in accuracy and efficiency. According to above designation, HiCoACR model divides the reconfiguration solution searching process into two sub-process including controlling directions of protocol flow and cryptogram resources matching. Sub-goal generation phase implemented by the Explorer screens the reconfiguration solutions with higher RE and controls the directions of sub-process according to protocol flows using ant colony algorithm with Q-learning, which greatly narrows the solution space. Meanwhile, cryptogram resources scheduling phase executed by the Worker is consist of sub-goal matching, candidate solution producing, solution verification and optimization and responsible for reconfiguration solution producing and optimization. Compared to above benchmark algorithms, HiCoACR adopts the advantages of population collaboration and reinforcement learning. When the number of ants in the Explorer is 0, HiCoACR degrades into Cuckoo search algorithm. When the number of ants in the Worker is 0, HiCoACR degenerates into ant colony algorithm with Q-learning. Thus, HiCoACR outperforms the given benchmarks in performance.

Table 3.

Performance analysis.

Algorithms	Items	Protocols
		Hsieh	Jiang	Priauth	Roy-Chowdhury	Chen
Ant colony	Time	2267.5	3016.7	2659.4	3635.5	5453.3
	Accuracy (%)	88.9	86.7	87.6	84.9	82.5
Cuckoo search	Time	1653.8	2543.1	1956.6	3124.2	4431.6
	Accuracy (%)	92.5	91.7	92.1	91.5	90.2
Reinforcement learning	Time	455.3	642.5	574.2	744.4	996.3
	Accuracy (%)	98.8	98.0	98.3	97.9	95.6
HiCoACR	Time	234.8	323.2	277.9	366.3	534.3
	Accuracy (%)	99.5	99.4	99.5	99.3	99.1

Model transferability analysis

Model transferability analysis aims at verifying the transferability of trained model on different categories of security protocols. Model transferability mainly reflects in the acceleration of convergence under the condition of ensuring accuracy in reconfiguration decision-making. Thus, average speed-up ratio is adopted to assess the model transferability. And speed-up ratio is defined as the ratio of convergence time of baseline and convergence time of given model, namely

sr = \frac{t_{b}}{t_{m}}

(24)

where $sr$ is speed-up ratio of a certain protocol under mode $m$ , $t_{b}$ denotes convergence time of baseline, and $t_{m}$ refers to as convergence time of given mode $m$ . Meanwhile, average speed-up ratio is weighted mean of all speed-up ratio for each protocol in the test sample.

Three modes are taken into consideration in this experiment including without pre-training, pre-trained with same category, and pre-trained with different categories. For mode of without pre-training, no reconfiguration policy is trained ahead of time and the pheromone of both protocol flow and cryptogram resources is initialized with 0. For mode of pre-trained with same category, reconfiguration policy has been generated according to security protocols the same category with the test samples. For mode of pre-trained with different categories, reconfiguration policy has already been generated with security protocols whose category is different with the test samples.

Set size of total cryptogram resources to be 100, take the first three authentication protocols in Table 2 as training samples and adopt rest two authentication protocols, key agreement protocols and communication protocols as test samples. Meanwhile, ant colony algorithm, Cuckoo search algorithm, and reinforcement learning are selected as benchmark algorithms as well.

Set the time cost of the mode without pre-training as baseline, we can see from Table 4 that average speed-up ratio of pre-trained with protocols in same category is larger than that of pre-trained with protocols in different categories. The reason is that the mode pre-trained with same category provides more useful information for reconfiguration than the mode pre-trained with different categories. Meanwhile, we can find that average speed-up ratio of HiCoACR is higher than that of given benchmarks regardless of application modes. The reason is that both protocol pheromone and resources pheromone can provide reconfiguration policy during decision-making, which results in less time cost in HiCoACR than other benchmark algorithms. Thus, HiCoACR outperforms the given benchmark in model transferability.

Table 4.

Average speed-up ratio comparison.

Algorithms	Average speed-up ratio (%)
	Without pre-training	Pre-trained with same category	Pre-trained with different category
Ant colony	100	156.4	116.3
Cuckoo search	100	100.1	100.1
Reinforcement learning	100	117.7	105.2
HiCoACR	100	185.4	146.9

In conclusion, the hierarchically collaborative ant colony-based reconfiguration proposed in this article enjoys higher convergence speed, better performance, and model transferability. However, as the ant colony algorithm has the shortcomings of long convergence time and is easy to get into the local optimal solution, our model may exist the same problem in some degree, although an adaptive pseudorandom ratio selection rule and a parallel computing mechanism are adopted in the simulation to alleviate the two shortcomings.

Conclusion

Reconfiguration decision-making is one of the key processes to reconfigurable security protocol design, which determines the performance of security protocol. This article combines the idea of hierarchical reinforcement learning and collaborative behavior of biological populations and proposes a reconfiguration decision-making model named HiCoACR based on hierarchical coordinated ant colony algorithm. It takes the advantages of reinforcement learning and population collaboration and handles the RDMP commendably. The method of this article may also be applied to decision-making problem such as military task programming, route planning. In addition, in order to eliminate the problems of long convergence time and falling into local optimal solution, we will keep on improving our model in the future study.

Footnotes

Acknowledgements

The authors would like to thank the Associate Editor and all the anonymous reviewers for their insightful comments and constructive suggestions.

Handling Editor: Suat Ozdemir

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (grant nos 61773399 and 61403400) and China Postdoctoral Science Foundation special funded project (grant no. 2017T100792).

ORCID iD

Zhuo Yi

References

Bossuet

Fischer

Gaspar

, et al. Disposable configuration of remotely reconfigurable systems. Microprocess Microsy 2015; 39: 382–392.

Yoon

Park

Yoo

. Security issues on smarthome in IoT environment. Lect Notes Electr En 2015; 330: 691–696.

Vezhnevets

Osindero

Schaul

, et al. FeUdal networks for hierarchical reinforcement learning. PMLR 2017; 70: 3540–3549.

Kulkarni

Narasimhan

Saeedi

, et al. Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Proceedings of the annual conference on neural information processing systems (NIPS), Barcelona, Spain, 5–10 December 2016, pp.3675–3683. Neural Information Processing Systems Foundation.

Niu

Zhou

Wang

. A hybrid particle swarm optimization for parallel machine total tardiness scheduling. Int J Adv Manuf Tech 2010; 49(5–8): 723–739.

Reynolds

. Cooperative random Lévy flight searches and the flight patterns of honeybees. Phys Lett A 2006; 354(5): 384–388.

Jin

, et al. An incremental genetic algorithm approach to multiprocessor scheduling. IEEE T Parall Distr 2004; 15(9): 824–834.

, et al. A genetic algorithm for task scheduling on heterogeneous computing systems using multiple priority queues. Inform Sci 2014; 270(6): 255–287.

Bryan

James

. A genetic algorithm methodology for complex scheduling problems. Nav Res Log 2015; 46(2): 199–211.

10.

Abraham

Liu

Zhang

, et al. Scheduling jobs on computational grids using fuzzy particle swarm algorithm. Future Gener Comp Sy 2010; 26(8): 1336–1343.

11.

Kong

Sun

. Particle swarm algorithm for tasks scheduling in distributed heterogeneous system. In: Proceedings of the sixth international conference on intelligent systems design and applications, Jinan, China, 16–18 October 2006, pp.690–695. New York: IEEE.

12.

Fan

Winley

. A Heuristic search algorithm for flow-shop scheduling. Informatica 2008; 32(4): 453–464.

13.

Wen

Yang

. A heuristic-based hybrid genetic-variable neighborhood search algorithm for task scheduling in heterogeneous multiprocessor system. Inform Sci 2011; 181(3): 567–581.

14.

Mencía

Sierra

Varela

. Depth-first heuristic search for the job shop scheduling problem. Ann Oper Res 2013; 206(1): 265–296.

15.

Sun

, et al. A scheduling strategy on load balancing of virtual machine resources in cloud computing environment. In: Proceedings of the international symposium on parallel architectures, algorithms and programming, Dalian, China, 18–20 December 2011, pp.89–96. New York: IEEE.

16.

Das

Viswanathan

Rittenhouse

. Dynamic load balancing through coordinated scheduling in packet data systems. In: Proceedings of the twenty-second annual joint conference of the IEEE computer and communications societies, San Francisco, CA, 30 March–3 April 2003, pp.786–796. New York: IEEE.

17.

Yan

Luh

. A fuzzy optimization-based method for integrated power system scheduling and inter-utility power transaction with uncertainties. IEEE T Power Syst 2002; 12(2): 756–763.

18.

Fagan

Fenton

Lynch

, et al. Deep learning through evolution: a hybrid approach to scheduling in a dynamic environment. In: Proceedings of the international joint conference on neural networks, Anchorage, AK, 14–19 May 2017, pp.775–782. New York: IEEE.

19.

Chen

. Deep reinforcement learning for multi-resource multi-machine job scheduling. In: The 25th IEEE international conference on network protocols, Toronto, Canada, 10–13 October 2017, pp.1–2. New York: IEEE.

20.

Chen

Clark

Jacob

. A search-based approach to the automated design of security protocols. Report, York Computer Science Technical Report Series (“Yellow Reports”), Department of Computer Science, University of York, UK, April 2004.

21.

Yang

Erdogan

Arslan

, et al. Multi-objective evolutionary optimizations of a space-based reconfigurable sensor network under hard constraints. In: Proceedings of the ECSIS symposium on bio-inspired, learning, and intelligent systems for security, Edinburgh, 9–10 August 2007, pp.72–75. New York: IEEE.

22.

Alrifai

Risse

. Combining global optimization with local selection for efficient QoS-aware service composition. In: Proceedings of the international conference on World Wide Web, Madrid, 20–24 April 2009, pp.881–890. New York: ACM.

23.

Chifu

Pop

Salomie

, et al. Optimising the semantic web service composition process using bio-inspired methods. Int J Bio-Inspir Com 2013; 5(4): 226–238.

24.

Zheng

Chen

, et al. An efficient and reliable approach for quality-of-service-aware service composition. Inform Sci 2014; 269(4): 238–254.

25.

Zhao

Shen

Peng

, et al. Toward SLA-constrained service composition: an approach based on a fuzzy linguistic preference model and an evolutionary algorithm. Informa Sci 2015; 316: 370–396.

26.

Ida

Russek

Jin

, et al. The successor representation in human reinforcement learning. Nat Hum Behav 2017; 1: 680–692.

27.

Roijers

Vamplew

Whiteson

, et al. A survey of multi-objective sequential decision-making. J Artif Intell Res 2013; 48(1): 67–113.

28.

Datta

Derek

Mitchell

, et al. Secure protocol composition. In: Proceedings of the ACM workshop on formal methods in security engineering, Washington, DC, 30 October 2003, pp.11–23. New York: ACM.

29.

Nielsen

Plotkin

Winskel

. Configuration structures, event structures and Petri nets. Theor Comput Sci 2009; 410(41): 4111–4159.

30.

Hsieh

Leu

. Anonymous authentication protocol based on elliptic curve Diffie–Hellman for wireless access networks. Wirel Commun Mob Com 2014; 14(10): 995–1006.

31.

Jiang

, et al. An efficient ticket based authentication protocol with unlinkability for wireless access networks. Wireless Pers Commun 2014; 77(2): 1489–1506.

32.

Chan

, et al. Privacy-preserving universal authentication protocol for wireless communications. IEEE T Wirel Commun 2011; 10(2): 431–436.

33.

Roy

Baras

. A lightweight certificate-based source authentication protocol for group communications in hybrid wireless/satellite networks. In: Proceedings of the IEEE global telecommunications conference, New Orleans, LO, 30 November–4 December 2008, pp.1–6. New York: IEEE.

34.

Chen

Chan

, et al. Lightweight and provably secure user authentication with anonymity for the global mobility network. Int J Commun Syst 2011; 24(3): 347–362.

35.

Mahmoud

Taha

Misic

, et al. Lightweight privacy-preserving and secure communication protocol for hybrid ad hoc wireless networks. IEEE T Parall Distr 2014; 25(8): 2077–2090.

36.

Khalil

Bagchi

Shroff

. Analysis and evaluation of SECOS, a protocol for energy efficient and secure communication in sensor networks. Ad Hoc Netw 2007; 5(3): 360–391.

37.

Cheikhrouhou

Koubâa

Dini

, et al. RiSeG: a ring based secure group communication protocol for resource-constrained wireless sensor networks. Pers Ubiquit Comput 2011; 15(8): 783–797.

38.

Tseng

. A secure and privacy-preserving communication protocol for V2G networks. In: Proceedings of the IEEE wireless communications and networking conference, 1–4 April 2012, pp.2706–2711. New York: IEEE.

39.

Amin

Biswas

. A novel user authentication and key agreement protocol for accessing multi-medical server usable in TMIS. J Med Syst 2015; 39(3): 33.

40.

Colorni

Dorigo

Maniezzo

. Distributed optimization by ant colonies. In: Proceedings of the first European conference on artificial life (ECAL), Paris, 1 January 1991, pp.134–142. Amsterdam: Elsevier.

41.

Yang

Deb

. Cuckoo search via Lévy flights. In: Proceedings of the world congress on nature and biologically inspired computing (NaBIC), Coimbatore, India, 9–11 December 2009, pp.210–214. New York: IEEE.

42.

Thomas

Marcus

. Reinforcement learning for MDPs using temporal difference schemes. In: Proceedings of the IEEE conference on decision and control (IEEE CDC), San Diego, CA, 12 December 1997, pp.577–583. New York: IEEE.

HiCoACR: A reconfiguration decision-making model for reconfigurable security protocol based on hierarchically collaborative ant colony

Abstract

Keywords

Introduction

Related works

Formalization of RDMP

The HiCoACR model

Sub-goal generation

Cryptogram resources scheduling

Cryptogram resource matching for sub-goal

Candidate solution producing

Solution verification and optimization

Hierarchical pheromone updating

Pheromone updating for sub-goals

Updating of pheromone for cryptogram resources

Model details

RE evaluation

Reconfiguration decision-making algorithm based on HiCoACR

Experiments and discussion

Experiments setup

Convergence analysis under different parameters

Resources requirements of protocol and total scale of cryptogram resources

Step type in solution verification and optimization phase

Volatility factor of pheromone

Population collaboration

Performance analysis

Model transferability analysis

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iD

References