Abstract
Message queuing telemetry transport has emerged as a promising communication protocol for resource-constrained electric Internet of things due to high bandwidth utilization, simple implementation, and various quality of service levels. Enabled by message queuing telemetry transport, electric Internet of things gateways adopt dynamic protocol adaptation, conversion, and quality of service level selection to realize bidirectional communication with massive devices and platforms based on heterogeneous communication protocols. However, protocol adaptation and quality of service guarantee in message queuing telemetry transport-empowered electric Internet of things still faces several challenges, such as unified communication architecture, differentiated quality of service requirements, lack of quality of service metric models, and incomplete information. In this paper, we first establish a unified communication architecture for message queuing telemetry transport-empowered electric Internet of things for adaptation and conversion of heterogeneous protocols. Second, we formulate the quality of service level selection optimization problem to minimize the weighted sum of packet-loss ratio and delay. Then, a delay-reliability-aware message queuing telemetry transport quality of service level selection algorithm based on upper confidence bound is proposed to learn the optimal quality of service level through dynamically interacting with the environment. Compared with single and fixed quality of service level selection strategies, delay-reliability-aware message queuing telemetry transport quality of service level selection can effectively reduce the weighted sum of delay and packet-loss ratio and satisfy the differentiated quality of service requirements of electric Internet of things.
Keywords
Introduction
Electric Internet of things (EIoT) can provide significant support for the intelligence, digitalization, and transparency of power grid through timely collecting the operation parameters, including voltage, current, as well as active and reactive power, and transmitting them to the cloud platform for processing and analysis. 1 In EIoT, the communication devices produced by different manufacturers utilize multiple communication protocols for data transmission and information interaction. 2 Typical communication protocols in EIoT include message queue telemetry transport (MQTT), data distribution service (DDS), constrained application protocol (CoAP), hypertext transfer protocol (HTTP), etc. DDS is commonly used for state monitoring in EIoT. 3 CoAP is particularly suitable for services like meter reading management and load forecasting. 4 HTTP is applicable for high-performance devices with large computing and storage resources in EIoT. 5 MQTT is suitable for lightweight data transmission of gateways due to the characteristics of high bandwidth utilization and simple implementation. 6 The gateway can achieve the adaptation and conversion of different protocols to MQTT. Through the information interaction between gateways based on MQTT, the connectivity and interoperability among different devices can be achieved, which shields the differences among various protocols.
QoS guarantee is of vital importance in the process of data transmission between gateway and platform in EIoT.7,8 MQTT provides three quality of service (QoS) levels, that is, at most once (QoS0) level, at least once (QoS1) level, and exactly once (QoS2) level, 9 which provide different QoS guarantees in terms of transmission delay and packet-loss ratio. Specifically, the transmission delay of QoS0 is relatively lower but the packet-loss ratio is higher, while QoS1 and QoS2 achieve no packet loss at the expense of increased transmission delay. Moreover, QoS1 guarantees that the data packet is successfully transmitted at least once, and QoS2 ensures that the data packet is successfully transmitted exactly once by leveraging more complicated retransmission mechanism. Therefore, it is necessary to dynamically and intelligently select MQTT QoS levels for data transmission between gateway and platform according to the time-varying network state and QoS requirements in EIoT. 10
However, the dynamic MQTT QoS level selection still faces some challenges, which are summarized as follows. First, the QoS requirements of control services and acquisition services differ in terms of delay and reliability.11–13 However, the different metrics are contradictory, for example, adopting retransmission mechanism ensures lower packet-loss ratio but greatly increasing transmission delay. Therefore, it is a critical challenge to achieve a balanced trade-off among different QoS metrics. 14 Second, the current delay and packet-loss ratio models do not take the impact of protocol-specific QoS guarantee mechanism on the physical-layer performance into consideration. Therefore, deriving the accurate closed-form models of delay and packet-loss ratio which are adaptive with MQTT-specific QoS levels is challenging. Last but not least, due to network resource limitation and prohibitive signaling overhead, the global state information (GSI), for example, channel gain, is uncertain.15–17 Therefore, it is necessary to intelligently optimize MQTT QoS level selection under incomplete information. 18
There exist some works that have addressed MQTT QoS level selection problems in IoT. Sadeq et al. 19 proposed a QoS approach for IoT environment utilizing MQTT and designed a flow control mechanism to minimize the transmission delay. Niruntasukrat et al. 20 proposed an authorization mechanism for MQTT-based IoT service platform to minimize delay and message overhead. However, these works have not considered the joint optimization of delay and packet-loss ratio. Lee et al. 21 proposed a push notification service network utilizing MQTT protocol to minimize the packet loss and delay by selecting appropriate QoS level according to different payloads. Nurwarsito et al. 22 proposed a communication architecture using MQTT protocol for emergency vehicles which aims to minimize the packet loss and average delay. However, the above-mentioned works have not considered uncertain GSI in practical EIoT application scenarios. Weerasinghe et al. 23 proposed an MQTT-based localization mechanism for wireless sensor network by utilizing supervised learning. Ahmadon et al. 24 proposed a machine learning-based anomaly detection method for MQTT-based network. However, these works need offline scene data, which cannot adapt to the complex environment in EIoT. 25
Reinforcement learning provides a powerful tool to deal with sequential decision problems under incomplete information.26–28 Among various reinforcement learning algorithms, upper confidence bound (UCB) originally developed for the multi-armed bandit (MAB) problems has rapid convergence speed and well-balanced trade-off between exploitation and exploration. Zhou et al. 29 proposed an energy-aware and data backlog-aware UCB-based channel selection algorithm, which can improve energy efficiency and throughput. However, the delay and reliability are not taken into account. Endo et al. 30 proposed a distributed QoS-UCB channel selection algorithm considering channel rating quality, which can improve the reliability and reduce the delay while avoiding congestion. However, this work has not considered the complex communication environment in EIoT and MQTT-specific QoS level selection optimization.
Motivated by the aforementioned challenges, we propose a delay-reliability-aware protocol adaption and QoS guarantee method for EIoT based on reinforcement learning. First, considering the adaptation and conversion of heterogeneous protocols, we establish a communication architecture of EIoT based on MQTT. Second, we propose a delay-reliability-aware MQTT QoS level selection (DR-MQLS) algorithm based on UCB to minimize the weighted sum of packet-loss ratio and delay. Last but not least, simulations are carried out to validate the effectiveness of DR-MQLS. Compared with single and fixed QoS level selection strategies, DR-MQLS can effectively reduce the weighted sum of packet-loss ratio and delay and satisfy the differentiated QoS requirements in EIoT. We summarize the main contributions of this work as follows:
Intelligent QoS Guarantee under Incomplete Information: DR-MQLS enables gateway to interact with environment and learn the optimal QoS level selection based on UCB under incomplete information. DR-MQLS can realize intelligent QoS guarantee with only local information.
Delay and Reliability Awareness: The closed-form models of delay and packet-loss ratio for three MQTT-specific QoS levels are derived. The optimization objective is defined to minimize the weighted sum of packet-loss ratio and delay. DR-MQLS can achieve delay and reliability awareness by selecting the MQTT QoS levels according to the specific QoS requirements of EIoT services.
Extensive Performance Evaluation: Extensive simulations are carried out to demonstrate the effectiveness and reliability of DR-MQLS. Specially, the effects of various parameter settings, such as the signal-to-noise threshold and the weight of delay in the optimization objective, have been illustrated to provide guidance for practical application.
The remaining structure is as follows. In section “System model and problem formulation,” we describe system model and problem formulation in details. The proposed DR-MQLS algorithm is introduced in section “Delay-reliability-aware MQTT QoS level selection in EloT.” Section “Simulation results” provides simulation results. In section “Conclusion,” we summarize this article.
System model and problem formulation
The considered communication architecture of EIoT based on MQTT is shown in Figure 1,31,32 which consists of an MQTT broker server, a cloud platform, multiple EIoT devices, and multiple gateways. The gateways with protocol adaption and conversion functions adopt publish/subscribe pattern for information interaction with cloud platform and can act as both publishers and subscribers. The broker server acts as an intermediary for data transmission between publishers and subscribers, which is deployed on the cloud platform. The publisher notifies the broker server with topics which it tends to publish. Then, the broker server keeps the topics and pushes them when subscribers ask for relevant topics. Multiple communication protocols are used for data transmission between EIoT devices and gateways, for example, HTTP, CoAP, and DDS. Through parsing and repackaging protocol messages, the gateway achieves the conversion between multiple protocols and MQTT protocol. An example is shown in Figure 1. The broker server pushes the subscribed topic and transmits the related data to the gateway based on the transmission mechanism specified by MQTT QoS1 level. Then the gateway executes protocol adaption and conversion to repackage protocol messages based on DDS, CoAP, and HTTP and transmits the data to the corresponding EIoT devices.

Communication architecture of EIoT based on MQTT.
We assume that there are
We assume that channel state remains unchanged during small packet data transmission process but varies across different small packets. 33 In particular, each retransmission is considered as a small packet transmission process for QoS1 and QoS2 which adopt retransmission mechanisms. The channel gain34,35 of the nth transmission of the jth small packet of the ith large packet is given by
where
Figure 2 shows MQTT data transmission processes of three QoS levels. The packet-loss ratio and delay models of the three QoS levels are elaborated in the following.

MQTT data transmission processes of three QoS levels.
QoS0 level
QoS0 provides best-effort delivery of the PUBLISH packet. After the gateway sending the PUBLISH packet to the broker server, the transmission process is completed immediately, regardless of whether the broker server receives the packet. Therefore, although the transmission delay of QoS0 is low, the packet-loss ratio is relatively high under poor channel states.
Packet-loss ratio model
QoS0 level for data transmission has only one PUBLISH packet transmission process. Therefore, the packet-loss variable of the jth small packet of the ith large packet in QoS0 level is given by
where
Here,
Delay model
The transmission delay of the jth small packet of the ith large packet in QoS0 level is given by
where
The total delay of the ith large packet is given by
QoS1 level
QoS1 adopts a PUBACK packet to acknowledge the reception of the PUBLISH packet. If the PUBACK packet is not received by the gateway within a certain time, the PUBLISH packet is retransmitted. In this case, the PUBLISH packet is received at least once at the broker server. The data deduplication process is required to delete the duplicate packets at the expensive of a certain data processing delay. 36 Therefore, the packet-loss ratio in QoS1 level is zero, but the transmission delay and data deduplication delay are relatively high.
Packet-loss ratio model
Since QoS1 adopts retransmission to ensure successful data transmission, the packet-loss ratio of the ith large packet is
Delay model
There are two transmission processes in QoS1 level, that is, PUBLISH packet transmission and PUBACK packet feedback. When the above two processes are successful, the transmission process of a small packet is completed. Define
We define
Then, the transmission delay of the jth small packet of the ith large packet in QoS1 level is given by
where
In order to simplify the model, we assume that the data deduplication delay of different small packets is uniformly defined as
The total delay of the ith large packet in QoS1 level is given by
QoS2 level
QoS2 ensures that messages are delivered exactly once through two interaction processes by means of PUBLISH, PUBREC, PUBREL, and PUBCOMP packets. In the first interaction process, after the gateway sending the PUBLISH packet to the broker server, if a PUBREC is not received within a certain time, the PUBLISH packet will be retransmitted until the PUBREC packet is successfully received. If a duplicate PUBLISH packet is received at the broker server, it will be deleted immediately. In the second interaction process, when receiving the PUBREC packet, the gateway responds to the broker server with a PUBREL packet and waits for the feedback PUBCOMP packet. Similarly, if the PUBCOMP packet is not received within a certain time, the PUBREL packet will be retransmitted until the PUBCOMP packet is successfully received. Therefore, the QoS2 level ensures that each packet is successfully received without duplication.
Packet-loss ratio model
Since QoS2 also adopts retransmission to ensure successful data transmission, the packet-loss ratio of the ith large packet is
Delay model
There are four processes, that is, PUBLISH packet transmission, PUBREC packet feedback, PUBREL packet transmission, and PUBCOMP packet feedback in QoS2 level. When the above processes are successful, the transmission of a small packet is completed. The PUBREL packet will be transmitted only after the PUBREC packet is successfully fed back.
In the first interaction process, we define
Therefore, the transmission delay of the jth small packet of the ith large packet in the first interaction process in QoS2 level is given by
where
The transmission delay of the jth small packet of the ith large packet in the second interaction process in QoS2 level is given by
where
Since there is no data deduplication process in QoS2 level, the total delay of the ith large packet is the sum of the transmission delays of the two interaction processes, which is given by
Problem formulation
To solve the differentiated QoS guarantee problem in EIoT, the optimization objective is defined to minimize the weighted sum of packet-loss ratio and delay under the QoS level selection constraint. The optimization problem is formulated as
where
Delay-reliability-aware MQTT QoS level selection in EloT
Problem transformation
MAB is an efficient reinforcement learning tool to cope with the sequential decision problems under incomplete information. 38 It describes a sequence of exploration–exploitation decision-making processes.39,40 The MAB model is mainly composed of decision makers, arms, and rewards. 41 In each round, the decision maker selects an arm, and the selected arm will generate a reward. 42 The decision maker aims to maximize its reward by exploiting the empirically optimal arm or exploring non-optimal arms.
In this paper, we transform
Decision Maker: Gateways are defined as the decision makers.
Arm: The three QoS levels of MQTT protocol are abstracted as arms, that is,
Reward: The reward of selecting the mth QoS level is defined as the reciprocal of the weighted sum of packet-loss ratio and delay, which is given by
The proposed DR-MQLS algorithm
DR-MQLS estimates the reward based on historical observations and considers estimation uncertainty through the confidence bound based on UCB. 43 Therefore, the gateway estimates its preference 44 toward mth QoS level as
Here,
Then, the gateway selects the QoS level with the maximum estimation value, which is denoted as
Therefore, DR-MQLS draws that
The implementation procedure of the proposed algorithm is summarized in Algorithm 1, which is divided into three phases, as follows:
Initialization: Initialize all the indicator variables as zero, that is,
Estimation and QoS Selection: The gateway calculates its preference toward the mth QoS level as equation (16) and selects the optimal QoS level
Learning: The gateway observes the packet-loss result and transmission delay of each small packet. Then the packet-loss ratio and delay of each large packet are calculated. Finally, calculate the reward as equation (15), update
Complexity analysis
The computational complexity of DR-MQLS is composed of three parts. The computational complexity of the first phase is
Simulation results
In this section, we validate the performance of DR-MQLS through simulations. The single and fixed QoS level selection strategies, that is, only selecting a specific QoS level for data transmission, for example, QoS0, QoS1, and QoS2, are used for comparison. We assume that there are a total of 800 large packets to be transmitted. The channel gain is randomly distributed within
Simulation parameters.
Figure 3 shows the weighted sum of packet-loss ratio and delay versus the number of large packet transmission. Simulation result shows that after 200 large packet transmissions, all the curves show the downward trend, and the performance of QoS0 decreases the fastest. The reason is that the packet-loss ratio of QoS0 decreases due to the channel gain improvement after 200 large packet transmissions, while QoS1 and QoS2 are less affected by the channel gain based on the retransmission mechanism. DR-MQLS outperforms the single and fixed QoS level selection strategies of QoS0, QoS1, and QoS2 in weighted sum of packet-loss ratio and delay by

The weighted sum of packet-loss ratio and delay versus the number of large packet transmission.
Table 2 shows the delay versus the number of large packet transmission. Simulation result demonstrates that the delay of DR-MQLS is slightly higher than QoS0. The reason is that there is no retransmission mechanism and deduplication process in QoS0 level. It performs best in terms of delay, but sacrifices the packet-loss ratio as shown in Figure 3. When
Average delay versus the number of large packet transmission.
DR-MQLS: delay-reliability-aware MQTT QoS level selection.
Figure 4 shows the optimal QoS level selection probability versus the number of large packet transmission. The optimal QoS level selection probability of DR-MQLS converges to 60.10% when the number of large packet transmission reaches

The optimal QoS level selection probability versus the number of large packet transmission.
Figure 5 shows the weighted sum of packet-loss ratio and delay versus

The weighted sum of packet-loss ratio and delay versus
Figure 6 shows the impact of

The impact of
Conclusion
In this paper, aiming at the QoS guarantee problem for EIoT based on MQTT protocol, we proposed a UCB-based delay-reliability-aware MQTT QoS level selection algorithm named DR-MQLS to minimize the weighted sum of packet-loss ratio and delay under incomplete information. Compared with the single and fixed QoS level selection strategies, that is, QoS0, QoS1, and QoS2, DR-MQLS can reduce the weighted sum of packet-loss ratio and delay by
Footnotes
Handling Editor: Peio Lopez Iturri
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was financially supported by the Science and Technology Project of State Grid Corporation of China under grant number 52094021N010 (5400-202199534A-0-5-ZN).
