Sage Journals: Discover world-class research

Abstract

In order to meet the demand for efficient computing services in big data scenarios, a cloud edge collaborative computing allocation strategy based on deep reinforcement learning by combining the powerful computing capabilities of cloud is proposed. First, based on the comprehensive consideration of computing resources, bandwidth, and migration decisions, an optimization problem is constructed that minimizes the sum of all user task execution delays and energy consumption weights. Second, a dynamic offloading scheduling algorithm based on Q-learning is proposed based on the optimization problem. This algorithm makes full use of the computing power for cloud and edge, which effectively meets the demand for efficient computing services in Internet of Things’ scenarios. Finally, facing the environment dynamic changes of edge nodes in edge cloud, the algorithm can adaptively adjust the migration strategy. Experiments show that when the number of Internet of Things’ devices is 30, the total energy consumption of Internet of Things’ devices of proposed algorithm is reduced by 24.67% and 19.44%, respectively, compared with other algorithms. The experimental results show that proposed algorithm can effectively improve the success rate of task offloading and execution, which can reduce the local energy consumption.

Keywords

Mobile edge computing task offloading resource allocation reinforcement learning Internet of things computing resources edge server

Introduction

Mobile cloud computing migrates the complex computing tasks of mobile devices by mobile networks, and offloads tasks to cloud data centers with powerful computing and storage capabilities for processing to meet the needs of complex emerging mobile applications.¹ However, the mobile cloud computing architecture has limitations; the cloud data center is far away from mobile terminal devices. The task to be processed needs to pass through access network and backhaul link can reach the cloud.^2–4 This will lead to excessively long application services in mobile cloud computing, and will not be able to achieve the design goal of 5G providing users with millisecond end-to-end latency.⁵ In addition, the centralized mobile cloud computing architecture affects network risk and attack resistance. For example, mobile cloud computing cannot meet the low-latency and high-reliability requirements of emerging applications such as Internet of Things (IoT), Industry 4.0, and telemedicine.^6–9 Therefore, it is necessary to design a new computing service architecture in mobile network to support the landing and development of new mobile applications.¹⁰

By sinking the computing and storage resources of cloud to the edge of network, services are provided for mobile devices near the edge network.^11–14 Mobile Edge Computing (MEC) can improve network data processing throughput, achieve low-latency and high-reliability data processing services, and solve the dilemma faced by mobile cloud computing.^15–18

In the MEC environment, in order to better satisfy users’ Quality of Service (QoS) requests and improve the quality of user experience, how to efficiently perform the computing and allocation of IoT resources is one of the hot issues that has been widely studied.^19,20 The literature²¹ proposed to optimize the energy consumption of edge computing networks under delay constraints. It summarized the problem as a mixed-integer programming problem, and further transformed it into an integer programming problem and solved it with a dynamic programming algorithm, which verified its superiority² compared with the branch and bound method. The literature²² designed a delay-sensitive microservice deployment system, which involves communication delay, calculation delay, and data migration delay. The shortest path algorithm and Q-learning algorithm are proposed to reduce the delay under offline and online conditions, respectively. However, the prior information required by offline algorithms is generally difficult to obtain, and Q-learning may also suffer performance degradation when the system state is complex. In the literature,²³ in response to mobile users’ demand for low energy consumption and low latency, a problem of minimizing energy consumption in MEC was proposed in order to study the energy consumption problem with performance guarantee in MEC, and Karush Kuhn Tucker (KKT) condition was used to solve and optimize this problem. Numerical simulation results show that this method is superior to local calculation and complete offloading methods in terms of energy consumption and delay performance. The literature²⁴ proposed a unified Mobile Service Provider (MSP) performance trade-off framework, and used Lyapunov technology to optimize the framework, designed the VariedLen algorithm to solve the optimization problem. The simulation results show that the algorithm can make the average profit of MSP reach the optimal level under the premise of ensuring system stability and low congestion. In order to save system energy consumption, the literature²⁵ proposed an energy-saving resource allocation algorithm for multi-user MEC systems while considering system communication and computing resources at the same time. Similarly, in order to improve the energy efficiency of system, under the constraints of a given task completion time, the literature²⁶ proposed the problem of Cloud-Radio Access Network (CRAN) and MEC joint energy minimization and resource allocation. The problem was transformed into a non-convex optimization problem, an iterative algorithm was used to solve the optimization problem, so that the weighted sum of two energy sources was minimized. Simulation results show that this method can improve system performance and save energy. The literature²⁷ used a Performance Aware Resource Allocation (PARA) scheme based on Depth Deterministic Policy Gradient (DDPG) to derive the optimal resource allocation strategy. This algorithm can solve the problem that when a large number of mobile users offloading computing tasks to edge server at the same time, due to the limited computing and communication resources of edge server, inefficient resource allocation will not be able to make full use of limited resources, resulting in waste of resources and low system performance problem. The literature²⁸ used mobile devices and edge servers to collaboratively allocate computing resources to reduce execution costs. The mixed-integer non-linear optimization problem of computing resource allocation and offloading strategies is segmented, and an iterative optimization algorithm is proposed to find the approximate best excellent solution. The literature²⁹ considered an MEC scenario with multiple mobile users and multiple heterogeneous edge servers. Under the limited battery of mobile devices and strict task completion time constraints, MEC system with completion time-aware energy consumption minimization problem is proposed, and two approximate algorithms are used to solve the optimization problem. However, these research schemes are basically based on precise algorithms or approximate algorithms based on mathematical programming to solve the corresponding computational migration optimization problems. It is difficult for them to make adaptive migration decisions based on dynamically changing loads in actual edge computing scenarios.

When faced with scenarios with a huge number of tasks, due to the limitations of edge node resources and the lack of cloud edge collaboration considerations, it is often difficult to meet the high-efficiency data processing requirements of big data scenarios, so this article proposes a cloud edge collaborative computing allocation strategy based on deep reinforcement learning. The proposed algorithm fully considers the resource requirements of mobile terminal applications and the remaining amount of resources of MEC server when making migration decisions. It always migrates tasks to the most suitable computing resources for execution as much as possible, so as to balance the resource utilization of MEC server and deploy more virtual machines to perform tasks.

System modeling

Network model

The edge computing load balancing problem first considers the computing load balancing problem in the data analysis application scenario of IoT devices in the large-scale cloud edge collaborative network. The large-scale cloud edge collaborative network architecture scenario is shown in Figure 1. The main research content is that the huge amount of data generated by underlying IoT devices is transmitted and processed by wireless and wired networks, and finally converged to the cloud data center for big data analysis and processing. Large-scale intelligent networks, such as smart homes and smart cities, need to analyze the data collected by IoT device nodes to complete corresponding intelligent processing. The centralized processing solution of traditional cloud data center cannot meet the existing big data processing needs. The massive amount of data generated by the huge amount of IoT devices poses a huge challenge to the transmission capacity of backhaul link and the processing capacity of core network. Thus, on one hand, it is necessary to take advantage of the nearby services of edge nodes to meet the low-latency requirements of IoT applications. On the other hand, collaboration between edge nodes and cloud data centers is needed to cope with the arrival of intensive data, disperse the pressure of data processing, which can ensure the normal operation of network.

Figure 1.

Large-scale edge cloud collaborative network architecture.

Figure 1 shows the system model of problem being studied. The system can be a three-tier architecture (i.e. including three types of nodes), from the bottom to top: IoT devices, edge nodes, and cloud data centers. The underlying IoT devices will generate a large number of tasks that require big data analysis and processing, which are transmitted to the corresponding edge server nodes via wireless links, and then converge upward to the uppermost cloud data center node for centralized big data analysis and processing. The edge server nodes in the middle—such as base stations, WI-FI access points, switches, routers, and gateways—are connected by wires to form an edge cloud. Edge noders have abundant computing resources. When transmitting the data generated by underlying IoT devices, it can use its own computing resources to preprocess the data, and transmit the preprocessing results to the cloud data center. In the process of data processing and transmission, multiple edge nodes can process and transmit data in a collaborative manner.

The set of cloud data centers and edge nodes in the network is $N = {0, 1, . . ., N}$ , where 0 represents the cloud data centers, and 1 to $N$ represent the edge nodes. The set of IoT devices is $M = {0, 1, . . ., M}$ , and the set of IoT devices connected to edge server $i$ is $M_{i}$ . The system runs according to the time sequence, and the unit time sequence length is denoted as $T$ .

Calculation model

Based on the definition of network model in the previous section, suppose $i$ represents the user terminal $i$ , where $i \in {0, 1, . . ., N}$ . $j$ represents the edge node $j$ , where $j \in {0, 1, . . ., N}$ . When $j = 0$ , specifically refers to the user terminal itself. Each user can migrate a computing task to a designated edge node, and the edge node can allocate bandwidth and computing resources for migration tasks. On the basis of sharing bandwidth and computing resources, this article assumes that each task is an independent execution unit, without considering the relevance of tasks. When multiple users have tasks that need to be calculated, first determine the maximum allowable delay of tasks and the time for local calculation. If the local calculation time is greater than the maximum allowable delay. A task migration request will be sent, and the task data size $D_{i}$ that needs to be calculated will be sent to the edge node. The edge node synchronizes the user’s information in edge cloud and generates a task table as follows

F_{ij} = {x_{ij}, λ_{ij}, β_{ij}, D_{i}, T_{i}^{tol}}

(1)

where $x_{ij}$ indicates whether the computing task is executed locally or is being migrated; $λ_{ij}$ indicates the percentage of bandwidth allocated by edge node $j$ to user $i$ . $β_{ij}$ represents the percentage of computing resources allocated by edge node $j$ to user $i$ , and $T_{i}^{tol}$ represents the maximum allowable delay of user terminal $i$ .

The edge node updates and maintains the above-mentioned task table synchronously under edge cloud. This synchronization only needs to update the table information after each edge node makes a task migration decision, and broadcast it to all edge nodes under the same edge cloud. Therefore, the total task set $F$ can be obtained from the table generated by all tasks

F = {F_{ij} | i \in {1, 2, \dots, N}, j \in {1, 2, \dots, M}}

(2)

The total task set $F$ will be formed at all edge nodes in edge cloud. Based on the total task set $F$ , this article can get the migration decision of all tasks, and then calculate the corresponding processing cost of each task.

Local execution

When the local calculation time is less than the maximum allowable delay of user $i$ , the task is executed by local calculation. The local execution delay of task $i$ is only related to the processing power of local CPU. Thus, the local calculation delay of task $i$ is

T_{ij}^{l} = (1 - x_{ij}) \frac{D_{i}}{β_{ij} f_{i}^{l}}

(3)

where $β_{ij}$ represents the percentage of CPU allocated to task $i$ by local users; $f_{i}^{l}$ represents the computing power of local users. Therefore, the energy consumption during local calculation of task $i$ is

E_{ij}^{l} = p_{i}^{l} T_{ij}^{l}

(4)

where $p_{i}^{l}$ represents the calculated power of local user $i$ .

Combined with the local calculation delay formula and corresponding energy consumption formula, the total cost of local calculation can be expressed as

C_{ij}^{l} = α T_{ij}^{l} + (1 - α) E_{ij}^{l}

(5)

where $α$ and $1 - α$ respectively represent the weight of time and energy cost of task $i$ , $α \in [0, 1]$ . Since each user’s task weight may be different, different tasks should be assigned different weights according to task types. When the task is a delay-sensitive task, increase the value of weight $α$ appropriately. When the task is energy-sensitive, reduce the value of weight $α$ appropriately. The specific initialization value can be selected according to a large number of experiments.

Migration execution

When the local calculation time of user task $i$ is greater than its maximum allowable delay, user $i$ chooses to execute tasks through calculation migration, that is, when $x_{ij} = 1$ or $j \geq 1$ . The entire migration process can be divided into five steps. First, if the task of user $i$ needs to be migrated, a data migration request is sent to edge cloud. Second, the edge node obtains the optimal migration decision according to the request and sends it to user $i$ . Subsequently, user $i$ uploads the data to be processed to designated edge node $j$ by the wireless access network or cellular mobile network, and edge node $j$ allocates corresponding bandwidth to it. After that, edge node $j$ allocates corresponding computing resources to process user $i$ tasks. Finally, edge node $j$ returns the execution result of tasks to user $i$ .

Because each user migrates to a different edge node with different uplink and downlink rates, suppose the uplink rate of task $i$ to be migrated to edge node $j$ is as follows

V_{ij}^{up} = λ_{ij} B_{j} \log (\frac{P_{i}^{up} | H_{i} |}{N_{0} Γ (g_{up}) d {(i, j)}^{ξ}})

(6)

where $B_{j}$ represents the bandwidth of edge node $j$ ; $P_{i}^{up}$ represents the transmission power of data uploaded by user $i$ . $H_{i}$ represents the channel gain of user $i$ in the wireless channel; $N_{0}$ represents the noise power of channel. $g_{up}$ represents the target bit error rate, and $Γ (g_{up})$ represents the signal-to-noise ratio margin introduced to meet the uplink target bit error rate. $d (i, j)$ represents the distance between user $i$ and edge node $j$ ; $ξ$ represents the loss index of transmission channel path.

Assuming that the downlink and uplink have the same channel environment and noise, the downlink rate can be expressed as

V_{ij}^{do} = λ_{ij} B_{j} \log (\frac{P_{i}^{do} | H_{i} |}{N_{0} Γ (g_{do}) d {(i, j)}^{ξ}})

(7)

According to the above steps, the transmission delay of user $i$ uploading computing tasks to edge node $j$ in migration calculation can be expressed as

T_{ij}^{t_{u}} = x_{ij} \frac{D_{i}}{V_{ij}^{up}}

(8)

According to the transmission delay, the corresponding transmission energy consumption can be obtained

E_{ij}^{t_{u}} = p_{i}^{up} T_{ij}^{t_{u}}

(9)

where $p_{i}^{up}$ is the uplink transmission power of user $i$ in unit time.

When the data are transmitted to edge node, CPU resources allocated to the task by edge node are used for calculation, and the calculation time of tasks for user $i$ on edge node $j$ can be obtained as

T_{ij}^{e} = {x_{i}}_{j} (\frac{D_{i}}{β_{ij} f_{j}^{e}})

(10)

where $f_{j}^{e}$ represents the computing power of edge node $j$ , and $β_{ij}$ represents the percentage of computing resources allocated by edge node $j$ to user $i$ .

When edge node $j$ completes the task of user $i$ , it returns the calculation result to user $i$ , where the transmission delay for edge node $j$ to send the result to user $i$ is

T_{ij}^{t_{d}} = {x_{i}}_{j} (\frac{D_{ij}^{o}}{V_{ij}^{do}})

(11)

where $D_{ij}^{o}$ is the size of data returned by edge node $j$ to calculation results.

Based on the transmission delay of edge node $j$ sending the result to user $i$ , this article can obtain the corresponding receiving energy consumption of user $i$

E_{ij}^{t_{d}} = p_{ij}^{do} T_{ij}^{t_{d}}

(12)

where $p_{ij}^{do}$ is the downlink transmission power of user $i$ in unit time.

Combining equations (8), (10), and (11), we can get the total delay of execution process of user task $i$ migration to edge node $j$ as

T_{ij}^{total} = T_{ij}^{t_{u}} + T_{ij}^{e} + T_{ij}^{t_{d}}

(13)

Therefore, it can be obtained that in the process of user task $i$ migration to edge node $j$ , the local waiting energy consumption of user $i$ is

E_{ij}^{w} = p_{i}^{w} T_{ij}^{total}

(14)

where $p_{i}^{w}$ is the power of devices for user $i$ in the waiting state.

Combining equations (9), (12), and (14), it can be obtained that the total energy consumption of user side in the process of migrating task $i$ to edge node $j$ is

E_{ij}^{total} = E_{ij}^{t_{u}} + E_{ij}^{t_{d}} + E_{ij}^{w}

(15)

Finally, combined with the total delay and total energy consumption during the execution of task migration for user $i$ to edge node $j$ , namely, equations (13) and (15), the total cost of migration process can be obtained as

C_{ij}^{total} = α T_{ij}^{total} + (1 - α) E_{ij}^{total}

(16)

where $α$ and $1 - α$ respectively represent the weight of time and energy cost of edge node processing tasks and $α \in [0, 1]$ . Similar to the above, since each edge node processing task weight may be different, the specific initialization value of weight can be selected according to a large number of experiments.

Optimization problem description

Latency and energy consumption are the two core indicators for measuring network performance. The optimization objectives of this article are mainly focused on the completion time and energy consumption of all tasks at user level. The specific optimization goal is to minimize the weighted sum of task execution delay and energy consumption of all users, that is, the total cost $C$ . The way is to realize through joint optimization of migration decision, bandwidth allocation, and computing resource allocation, in which tasks can be executed locally or migrated. The specific optimization problem is constructed as follows

min C = min \sum_{j = 0}^{M} \sum_{i = 1}^{N} z_{ij} [(1 - x_{ij}) C_{ij}^{l} + x_{ij} C_{ij}^{total}]

(17)

s.t.

(1 - x_{ij}) T_{ij}^{l} + x_{ij} T_{ij}^{total} \leq T_{i}^{tol}

(17a)

\sum_{i}^{N} z_{ij} λ_{ij} \leq 1, \forall j \in {0, 1, \dots, M}

(17b)

\sum_{i}^{N} z_{ij} β_{ij} \leq 1, \forall j \in {0, 1, \dots, M}

(17c)

z_{ij} \in {0, 1}

(17d)

In the above optimization problem, the objective function (17) is to minimize the sum of weights for completion time of all tasks and the energy consumption of user side, expressed by the total cost C. Constraint (17a) indicates that neither the delay caused by local computing nor the delay caused by migration computing can be greater than the maximum delay that the user can tolerate for task execution. Constraint (17b) indicates that the sum of bandwidth proportions allocated by node $j$ to each task must be less than or equal to 1, that is, the sum of bandwidth occupied by all user tasks migrated to edge node is less than or equal to the maximum bandwidth of edge node. The same is true for bandwidth allocation for local users. Similarly, constraint (17c) indicates that the sum of CPU proportions of all tasks migrated to edge nodes (or executed locally) is less than or equal to 1. Constraint (17d) represents the value constraint of variable $z_{ij}$ , when $z_{ij} = 0$ indicates that task $i$ did not select node $j$ for calculation. When $z_{ij} = 1$ represents task $i$ , node $j$ is selected to perform the calculation.

Dynamic offloading method of edge computing based on reinforcement learning

Reinforcement learning

Reinforcement learning is an important branch of machine learning. In this branch, the agent learns how to get the maximum reward through interaction with environment. Unlike supervised learning, reinforcement learning cannot learn from samples provided by experienced external supervisors. On the contrary, it must learn from its own experience, even though it faces greater uncertainty in environment. The definition of reinforcement learning is not to describe learning methods, but to describe learning problems. Besides, method suitable for solving this problem can be considered as a reinforcement learning method. A reinforcement learning problem generally includes the following elements:

State: it describes the current environmental situation, such as a Go program, the state is the position of chess piece on the chessboard. The state space refers to all possible environmental conditions.

Action: it represents the actions that the agent may perform in each state. The action space refers to all possible operations of agents.

Reward: in a certain state, after performing a certain action, the reward is obtained. The reward may be positive or negative (i.e. punishment).

State transition probability: it indicates the probability value of the system transitioning to the next state after performing a certain action in a certain state.

Strategy: it indicates the mapping relationship between states and actions, that is, which action to execute in a certain state, usually expressed as $a (t) = π (x (t))$ . The agent needs to continuously try all possible state-action combinations. Strategy $π$ represents the sequence of actions in state space. The purpose of reinforcement learning is to find the best strategy.

Value function: reinforcement learning focuses on the maximization of long-term rewards, not the maximization of instant rewards. If you only maximize the instant reward, it will only choose the action with the largest instant reward each time, which becomes a simple greedy strategy. In order to well represent the long-term reward accumulated from the current moment until state reaches the goal, a value function is used to describe this variable.

Reinforcement learning problem can usually be described as the optimal control of Markov Decision Process (MDP), but it is not necessary to know the state space, transition probability, and reward function. Thus, reinforcement learning is very effective in dealing with complex problems in the real world. An MDP consists of a finite number of states, actions, and rewards, which can be expressed as

x_{0}, a_{0}, r_{1}, x_{1}, a_{1}, r_{2}, x_{2}, a_{2}, r_{3}, \dots, x_{n - 1}, a_{n - 1}, r_{n}

(18)

where $x_{j}$ represents the state, $a_{j}$ represents the action, and $r_{j + 1}$ represents the reward after action is taken. When the state reaches the preset end state $x_{n}$ , an MDP ends.

Reinforcement learning has two outstanding features: trial and error search and delayed reward. Trial and error search means a trade-off between exploration and exploitation. Agents prefer to use effective actions they have tried in the past to generate rewards. But it must also explore new and better actions that may generate higher returns in the future. The agent must try a variety of actions, and gradually favor those that get the most rewards. Another feature of reinforcement learning is that the agent must not only consider the immediate reward, but also the long-term accumulated reward, that is, the value function.

Generally speaking, depending on whether the environmental elements (i.e. state transition probability and reward function) are known, reinforcement learning can be divided into model-free reinforcement learning and model-based reinforcement learning. Model-free reinforcement learning has recently been successfully applied to deep neural networks. It can directly input the original state into the deep neural network to learn more difficult task strategies. In contrast, model-based reinforcement learning learns the system model with the help of supervised learning, and optimizes the strategy under this model. In recent years, model-based reinforcement learning elements have been incorporated into model-free deep reinforcement learning to increase the learning speed without losing advantages of model-free learning. Q-learning algorithm is one of the widely used model-free reinforcement learning algorithms. The Q-function is defined as the state-action value function, usually expressed as $Q_{π} (x, a)$ , that is, after action $a$ is executed in state $x$ , the value function of strategy $n$ is continued to be executed. The most important part of Q-learning algorithm is the correct and effective method of estimating the value of Q-function. The Q-function can be implemented by a lookup table or a function approximator, and sometimes, it can be a non-linear approximator, such as a neural network, or a more complex deep neural network.Q-learning algorithm combined with deep neural network is deep Q-learning algorithm.

Dynamic offloading scheduling algorithm based on Q-learning

Under normal circumstances, due to the dynamic changes of network environment, conventional dynamic programming algorithms or model-based algorithms cannot effectively solve such optimization problems without prior knowledge of environment. Because the agent cannot predict the next state of environment before acting. Thus, this article uses this model-free reinforcement learning method to study the problem of offloading optimization.

Q-learning algorithm is a very classic model-free, offline strategy reinforcement learning algorithm. The core of Q-learning algorithm lies in Q-table. The rows and columns of Q-table are composed of states and actions, namely, the state set $S$ and action set $A$ . The value of each state-action group in Q-table is the action value function $Q (s_{t}, a_{t})$ , which is expressed as the expected value of long-term cumulative reward. The offloading scheduler calculates $Q (s_{t}, a_{t})$ and stores it in Q-table. According to the Bellman optimal equation, the update formula of $Q (s_{t}, a_{t})$ is

\begin{matrix} Q (s_{t}, a_{t}) = (1 - μ) Q (s_{t}, a_{t}) + μ (U (s_{t}, a_{t}) \\ + γ max Q (s_{t + 1}, a_{t + 1})) \end{matrix}

(19)

where $μ$ represents the learning rate, which is a constant that satisfies 0 < µ ≤ 1. Generally, the value of learning rate $μ$ determines the speed of learning convergence. The larger the value of $μ$ , the faster the convergence speed. But if the value is set too much, the algorithm will converge to the local optimal solution, so the value of $μ$ needs to be set reasonably. $γ$ represents the discount factor, and its value range is $γ \in (0, 1]$ . The smaller the value of $γ$ , it means that the offloading scheduler mainly considers the current instantaneous reward. If the value of $γ$ is larger, it means that the offloading scheduling also pays attention to future rewards.

In the process of learning and training of Q-learning algorithm, in order to prevent falling into the local optimal solution, ε-greedy strategy is used to learnQ-table. The ε-greedy strategy is a balance between exploration and utilization. Exploration is to randomly select actions, that is, randomly select an offloading strategy for task $k$ among local computing, edge nodes and cloud servers. Exploring the effects of unknown actions is conducive to updating the action value function. The utilization is based on the action corresponding to greedy maximum Q-value of the current state. For ε-greedy strategy, every time an attempt is made to select an action, the probability of $ε$ is used to randomly explore, that is, the action that maximizes action value function is selected with a probability of $1 - ε$ .

Since offloading optimization is considered in a dynamic network environment, it is necessary to determine which one or more tasks can be executed at the same time in the time slot $t$ . Through the data dependency between tasks acquired in the initial stage and completed task set in the state $s_{t}$ of each time slot $t$ , the task scheduling process in dynamic environment as shown in Figure 2 can be given.

Figure 2.

Task scheduling process in dynamic scenario.

Based on Figure 2, this article gives the specific steps of task scheduling in a dynamic environment:

Step 1: after the IoT service is started by remotely issuing instructions, the preparation time and completion time of each task are initialized as RT table and FT table, respectively, and scheduling queue $q$ is initialized for task scheduling. Since the edge server or cloud server stores the inter-task dependency information $G_{s}$ of IoT services, the starting task can be calculated by $G_{s}$ and added to the scheduling queue.

Step 2: when Q-learning algorithm is executed in the time slot T, actions are selected for all tasks in scheduling queue $q$ , and the table is updated according to selected actions.

Step 3: update RT table according to inter-task dependency information $G_{s}$ .

Step 4: search for the task with the smallest value and unscheduled task in RT table. There may be none, one, or more tasks that meet the conditions, and add these tasks to scheduling queue $q$ .

Step 5: check whether the scheduling queue is empty. If it is not empty, it means that there are still tasks to be calculated. Repeat Steps 2–5 in the next time slot $t + 1$ . If the scheduling queue is empty, it means that all tasks have been scheduled. After the tasks are executed, it means that the IoT service is completed.

Simulation results and analysis

In the simulation scenario of this article, it is assumed that there are three edge nodes, and the bandwidth of edge nodes is 100, 150, and 200 MHz, respectively. The computing capabilities of edge nodes are 150, 200, and 250 Mb/s. The computing energy consumption per unit time of edge nodes is 0.001, 0.002, and 0.003 J, respectively. At the same time, assuming that the number of user terminals is 30, it means that there are 30 user terminals that have tasks to be calculated. The task data size of each user terminal is randomly generated between 100 and 500 Mb. The distance between each user terminal and edge node is also randomly generated. Furthermore, it is assumed that the local computing capability $f_{j}^{l}$ of user terminals is 40 Mb/s, and the calculated energy consumption per unit time of user terminals is 0.02 J. For the convenience of calculation, the migration energy consumption per unit time of user terminals is 0.01 J, and the waiting energy consumption per unit time of user terminals is 0.001 J. Assume that the bandwidth ratio $λ_{ij} (t)$ and computing resource ratio $β_{ij} (t)$ allocated by edge node $j$ to user terminal $i$ are both 0.01. At the same time, it is assumed that the proportion $β_{i 0} (t)$ of local CPU of user terminal $i$ is 0.6.

Figure 3 evaluates the impact of different learning rates on the reward value in the deep neural network. From the figure, it can be found that (1) as the learning rate decreases, the convergence of reward value gradually slows down. This is because the learning rate is too small, and the efficiency of each iteration optimization is too low. Thus, the learning rate in deep neural networks cannot be too low. (2) When the learning rate is larger, as the number of iterations increases, the optimal value may be exceeded, causing oscillations near the optimal value. Thus, the learning rate in deep neural network in this article can neither be too low nor too high. Based on the results of multiple simulations, the learning rate selected at the end of this article is 0.001.

Figure 3.

Convergence of reward value under different learning rates.

Figure 4 shows the relationship between the total energy consumption of IoT devices and the number of IoT devices participating in the service under different algorithms. In a dynamic scenario, the number of tasks K of IoT devices increases from 5 to 30 with the number of IoT devices. It can be found that as the number of IoT devices increases, the total energy consumption of IoT devices obtained by all algorithms increases. But compared to the other two methods, the performance of proposed algorithm is better than the other two methods in terms of energy consumption. When the number of IoT devices is 30, the total energy consumption of IoT devices of proposed algorithm is reduced by 24.67% and 19.44%, respectively, compared with the algorithms proposed in the literature.^27,28 The proposed algorithm adopts ε-greedy strategy to learn Q-table, which prevents falling into the local optimal solution, so it can show efficient energy-saving performance under different numbers of IoT devices. The proposed algorithm makes full use of the computing power of cloud and effectively meets the demand for efficient computing services in IoT scenarios. Moreover, facing the environment dynamic changes of edge nodes in edge cloud, the proposed algorithm can adaptively adjust migration strategy to minimize the system delay and energy consumption.

Figure 4.

Relationship between total energy consumption of IoT devices and number of IoT devices.

Figure 5 illustrates the impact of different average data sizes on the total energy consumption of IoT devices. It can be observed that with the increase in average data size, the energy consumption of IoT devices for these three methods is approximately linearly increased. Besides, the energy consumption of the proposed algorithm is the smallest compared with the other two algorithms. It shows that proposed algorithm can adaptively adjust the offloading strategy according to data size, so as to achieve the effect of efficiently optimizing the total energy consumption of IoT devices. However, due to the dependency between tasks, waiting for energy consumption becomes not negligible. The comparison method will show poor energy consumption performance when the average data are large, which will directly increase the energy consumption of other IoT devices. In the case of different average data sizes, the proposed algorithm can get the best energy-saving effect.

Figure 5.

Impact of different average data sizes on the total energy consumption of IoT devices.

In order to analyze the performance of IoT resource computing allocation strategy based on edge computing, the other two migration location selection strategies are used as comparison strategies. One is the fastest response time strategies (Best Response, BR for short), that is, the node with the shortest completion time of previous round of task processing is selected as migration destination. The other is random selection strategy (Random), that is, any service node is randomly selected as the migration destination in MEC platform. Figure 6 shows the comparison of average task completion time for these three strategies in MEC environment when different numbers of application tasks participate in migration. The task completion time refers to time interval from when the task is submitted to when the task is completed and result is returned to users.

Figure 6.

Comparison of task completion time of three strategies.

It can be seen from Figure 6 that IoT resource computing allocation strategy based on edge computing is always better than Random strategy and BR strategy. The average time to complete its tasks is always the smallest. When the number of migration tasks is small, the advantage of proposed algorithm in the average task completion time is not obvious. This is because the available resources are relatively sufficient at the beginning, the resource utilization is relatively balanced, and the corresponding virtual machines can be deployed. As the number of tasks involved in migration increases, the average task completion time of proposed algorithm is much lower than the other two strategies. This is because the total amount of resources for MEC is fixed, and proposed algorithm makes full use of the computing power of both sides and effectively meets the demand for efficient computing services in IoT scenarios. Facing the environment dynamic changes of edge nodes in edge cloud, the proposed algorithm can adaptively adjust migration strategy, which improves the efficiency of resource allocation.

Figure 7 is a comparison chart of task overhead under the three strategies. Figure 7 shows the comparison of average task overhead costs of these three algorithms. Task cost refers to the actual scheduling cost that users pay to complete task migration in MEC environment.

Figure 7.

Comparison of task overhead of three strategies.

It can be seen from Figure 7 that as the number of tasks increases, the competition between tasks is fierce, and the task waiting time is longer. Therefore, the average task completion time increases, the resource unit price also increases dynamically, and the average task overhead will increase accordingly. The average task migration overhead of computing allocation strategy based on edge computing for IoT resources is significantly better than Random strategy and BR strategy. This is because the proposed algorithm considers the utilization and price of computing resources when making migration decisions, and it is always more inclined to migrate tasks to computing nodes that match its resource requirements and have relatively low prices. This reduces the migration overhead cost of application tasks to a certain extent.

Conclusion

This article constructs a problem of minimizing the user task execution delay and weighted sum of energy consumption, and then proposes a computing allocation strategy based on edge computing for IoT resources to solve the optimization problem. The proposed algorithm can show efficient energy-saving performance under different number of IoT devices. The comparison method lacks the consideration of cloud edge collaboration, and it is often difficult to make the optimal decision in the dynamic diversity environment with great differences.

In the next step of research work, one is to consider the mobility of mobile terminals, predict its moving direction and trajectory, and calculate the connection time between the mobile terminal and computing resources. After judging whether the task migration can be completed within its connection time, further migration decisions can be made. Besides, the second is to further consider the allocation of other physical layer resources, such as spectrum and power.

Footnotes

Acknowledgements

The author expresses the appreciation to the reviewers for their helpful suggestions which greatly improved the presentation of this article.

Handling Editor: Yanjiao Chen

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was financially supported by the National Education Information Technology Research 2018 project funding, project no. 186130014.

ORCID iD

Zengrong Zhang

References

Shi

Jie

Quan

, et al. Edge computing: vision and challenges. IEEE Intern Things J 2016; 3(5): 637–646.

Sardellitti

Scutari

Barbarossa

Joint optimization of radio and computational resources for multicell mobile-edge computing. IEEE Trans Signal Inform Process Over Netw 2015; 1(2): 89–103.

Xiong

Zhang

Dusit

, et al. When mobile blockchain meets edge computing. IEEE Commun Mag 2017; 56(8): 33–39.

De Assunção

Veith

Buyya

Distributed data stream processing and edge computing: a survey on resource elasticity and future directions. J Netw Comput Appl 2018; 103(12): 1–17.

Muhammad

Prem

Saif

, et al. RedEdge: a novel architecture for big data processing in mobile edge computing environments. J Sens Actuat Netw 2017; 6(3): 17–26.

Zhi

Wang

Guo

, et al. Toward open manufacturing: a cross-enterprises knowledge and services exchange framework based on blockchain and edge computing. Ind Manage Data Syst 2018; 118(9): 303–320.

Sheng

Pfersich

Eldridge

, et al. Wireless acoustic sensor networks and edge computing for rapid acoustic monitoring. IEEE/CAA J Autom Sin 2019; 6: 64–74.

Satyanarayanan

The emergence of edge computing. Computer 2017; 50(1): 30–39.

Wang

, et al. Joint offloading and computing optimization in wireless powered mobile-edge computing systems. IEEE Trans Wirel Commun 2017; 11(9): 1–16.

10.

Feng

Liu

Pei

, et al. Service characteristics-oriented joint optimization of radio and computing resource allocation in mobile-edge computing. IEEE Intern Things J 2021; 8(11): 9407–9421.

11.

Zhang

Fan

, et al. Computing resource allocation scheme of IOV using deep reinforcement learning in edge computing environment. EURASIP J Adv Signal Process 2021; 21(12): 1–19.

12.

Chen

Mobile edge computing resource allocation: an operating system view. Comput Netw 2021; 190(3): 1079–1095.

13.

Yao

Mai

, et al. Reinforcement and belief learning-based double auction mechanism for edge computing resource allocation. IEEE Intern Things J 2019; 8(6): 1008–1017.

14.

Qin

, et al. Resource allocation method based on mobile edge computing in smart grid. IOP Conf Ser Earth Environ Sci 2021; 4(1): 54–67.

15.

Paymard

Mokari

Resource allocation inPD-NOMA–based mobile edge computing system: multiuser and multitask priority. Trans Emerg Telecommun Technol 2019; 7(1): 36–51.

16.

Huang

Wang

, et al. A bilevel optimization approach for joint offloading decision and resource allocation in cooperative mobile edge computing. IEEE Trans Cybern 2019; 50(10): 1–14.

17.

Zhang

Wang

, et al. Resource management framework based on the Stackelberg game in vehicular edge computing. Complexity 2020; 31(6): 2001–2017.

18.

Fan

Wang

, et al. Cloud/edge computing resource allocation and pricing for mobile blockchain: an iterative greedy and search approach. IEEE Trans Comput Soc Syst 2021; 2(9): 1–13.

19.

Jiang

Chang

Yang

, et al. Model-based comparison of cloud-edge computing resource allocation policies. Comput J 2020; 2(10): 24–38.

20.

Wang

Yao

Zheng

, et al. Joint task assignment, transmission and computing resource allocation in multi-layer mobile edge computing systems. IEEE Intern Things J 2018; 5(4): 76–85.

21.

Tian

Zhang

, et al. Energy-efficient admission of delay-sensitive tasks for mobile edge computing. IEEE Trans Commun 2018; 66(6): 2603–2616.

22.

Wang

Guo

Zhang

, et al. Delay-aware microservice coordination in mobile edge computing: a reinforcement learning approach. IEEE Trans Mobile Comput 2019; 20(3): 939–951.

23.

Tao

Ota

Dong

, et al. Performance guaranteed computation offloading for mobile-edge cloud computing. IEEE Wirel Commun Lett 2017; 6(6): 774–777.

24.

Wang

, et al. Dynamic resource scheduling in mobile edge cloud with cloud radio access network. IEEE Trans Parall Distrib Syst 2018; 29(11): 2429–2445.

25.

Guo

Song

Cui

, et al. Energy-efficient resource allocation for multi-user mobile edge computing. In: Proceedings of the 2017 IEEE global communications conference, Singapore, 4–8 December 2017, pp.1–7. New York: IEEE.

26.

Wang

Yang

Magurawalage

Joint energy minimization and resource allocation in C-RAN with mobile cloud. IEEE Trans Cloud Comput 2015; 6(3): 760–770.

27.

Huang

, et al. Deep reinforcement learning for performance-aware adaptive resource allocation in mobile edge computing. Wirel Commun Mobile Comput 2020; 7(9): 1–17.

28.

Qin

Qiu

, et al. User-edge collaborative resource allocation and offloading strategy in edge computing. Wirel Commun Mobile Comput 2020; 20(11): 207–219.

29.

Zhu

Shi

, et al. Task scheduling in deadline-aware mobile edge computing systems. IEEE Intern Things J 2018; 6(3): 4854–4866.

A computing allocation strategy for Internet of things’ resources based on edge computing

Abstract

Keywords

Introduction

System modeling

Network model

Calculation model

Local execution

Migration execution

Optimization problem description

Dynamic offloading method of edge computing based on reinforcement learning

Reinforcement learning

Dynamic offloading scheduling algorithm based on Q-learning

Simulation results and analysis

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iD

References