Brokel: Towards enabling multi-level cloud elasticity on publish/subscribe brokers

Abstract

Internet of Things networks together with the data that flow between networked smart devices are growing at unprecedented rates. Often brokers, or intermediaries nodes, combined with the publish/subscribe communication model represent one of the most used strategies to enable Internet of Things applications. At scalability viewpoint, cloud computing and its main feature named resource elasticity appear as an alternative to solve the use of over-provisioned clusters, which normally present a fixed number of resources. However, we perceive that today the elasticity and Pub/Sub duet presents several limitations, mainly related to application rewrite, single cloud elasticity limited to one level and false-positive resource reorganization actions. Aiming at bypassing the aforesaid problems, this article proposes Brokel, a multi-level elasticity model for Pub/Sub brokers. Users, things, and applications use Brokel as a centralized messaging service broker, but in the back-end the middleware provides better performance and cost (used resources × performance) on message delivery using virtual machine (VM) replication. Our scientific contribution regards the multi-level, orchestrator, and broker, and the addition of a geolocation domain name system service to define the most suitable entry point in the Pub/Sub architecture. Different execution scenarios and metrics were employed to evaluate a Brokel prototype using VMs that encapsulate the functionalities of Mosquitto and RabbitMQ brokers. The obtained results were encouraging in terms of application time, message throughput, and cost (application time × resource usage) when comparing elastic and non-elastic executions.

Keywords

Internet of Things publish subscribe communication cloud elasticity resource management multi-level

Introduction

The Internet of Things (IoT) is considered the next evolution in the era of computing after the Internet.^1,2 In the IoT paradigm, many objects around us will be connected in networks and will communicate with each other without any assistance or human intervention. IoT devices implement different types of services for applications related to smart health, smart farming, smart city, among others. Normally, these services use a middleware layer that can act as a messaging service, receiving information, storing, and processing it, in addition to making automatic decisions based on the environment status and historical data.³ Often, brokers or intermediate nodes (also named in some cases as superpeers) are used in this layer to enable communication among applications and things in a publish/subscribe (Pub/Sub) fashion.⁴ Pub/Sub is a communication paradigm for asynchronous data dissemination among services and users.⁵ In many Pub/Sub systems, clients are able to register subscriptions in a particular broker. In turn, brokers receive post messages from publishers and perform the filtering according to the subscriptions.^6,7

As an indirect communication pattern,⁸ where publishers and subscribers sometimes do not have knowledge of each other, Pub/Sub is useful to implement several IoT scenarios and applications. A smart object is responsible for generating events based on the collected information, which can be used not only by other things or high-level applications, but also further processed by other modules (e.g. an event-processing unit).⁴ For instance, we can model a Pub/Sub system when collecting patients vital data via a network of sensors connected to medical devices, so delivering the data to a medical center for storage and processing and guaranteeing ubiquitous access to medical data in the electronic healthcare record (EHR) format. Figure 1(a) depicts a standard Pub/Sub deployment with a single broker. However, this organization suffers from the traditional problem related to centralized systems, that is, scalability. In other words, this deployment sometimes cannot provide quality-of-service (QoS) when enlarging the network clients, because of the larger amount of data in the system, the larger infrastructure requirements to store, process, and present it efficiently at real-time.

Figure 1.

Pub/Sub deployment approaches: (a) centralized, (b) P2P-like distributed style, and (c) Brokel proposal.

Figure 1(b) illustrates a possibility to explore scalability using a Peer-to-Peer (P2P) network of brokers. This organization could be assembled using a cluster or grid infrastructure. Traditionally, both cluster and grids have a fixed number of resources that must be maintained in terms of infrastructure configuration, scheduling (where tools such as PBS (http://www.pbspro.org), OAR (http://oar.imag.fr), and open grid scheduler (http://gridscheduler.sourceforge.net) are usually employed for resource reservation and job scheduling), and energy consumption. The resource requirements of IoT Pub/Sub applications will inevitably fluctuate over time.⁹ Thus, parallel machines such as clusters and grids may lead to either under-provisioning or over-provisioning situations, incurring in performance and cost penalties, in addition to do not addressing the growing demand of IoT systems.¹⁰ IoT platforms should provide facilities to adapt (i.e. dynamically reconfigure) the resources allocated according to the perceived changes to the environment.

Cloud Computing can provide the virtual infrastructure to support the IoT paradigm by integrating monitoring, storage, analytical tools, visualization platforms, and client access.^8,10,11 Unlike clusters and grids, cloud providers offer features to hide all the complexity and functionalities necessary to implement an IoT ecosystem.¹² Among the main features, it is possible to emphasize the elasticity which allows users to change the cloud capacity at any time dynamically.¹³ Through the on-demand and pay-as-you-go provisioning principle, the interest in elasticity is related to the benefits it can deliver: better performance, improved resource utilization, and reduced costs. In this way, we can allocate a small number of resources at application launching time, so this number is moldable at runtime without user interference.

Although several principles have been studied for cloud computing elasticity and IoT systems, we have not seen such principles for engineering IoT Pub/Sub systems on top of cloud resource reorganization. Our review of the state-of-the-art in the aforementioned topics revealed five gaps: (1) obligation to change the application source code or to develop additional scripts to transform a non-elastic application in an elastic one;^1–3,14 (2) use of proprietary software components, not available to buy or download;^2,3 (3) system scalability, but only for a particular set of Pub/Sub middlewares;^1–3,14 (4) virtual machine (VM) thrashing, so launching elasticity actions on sporadic peaks prematurely;^1–3,14 and (5) single-level elasticity support.^11,15–17 Exploring (5), we perceive that cloud elasticity is widely explored on Web systems, where a unique entry point acts as load balancer and elasticity manager. But, a research question arises: what happens whether the own entry point becomes overloaded?

Aiming at filling these gaps, we are proposing in this article a model named Brokel, a multi-level elasticity model for Pub/Sub brokers. Thus, users, things, and applications employ Brokel as a centralized messaging service broker, which is responsible for providing better performance and cost (used resources × performance) on message delivery using cloud elasticity. Figure 1(c) presents the main ideas of our proposal, which includes two main levels: orchestrator and broker. Particularly, the orchestrator is a component in charge of managing the distribution of client requests among the brokers. The term orchestrator is commonly used to define a component with administration functions, as presented in Hurtle (http://hurtle.it/), Open Baton,¹⁸ LiveCloud,¹⁹ and Roboconf.²⁰ The multi-level keyword is explored in two elasticity levels, orchestrator and broker, and through the addition of a geolocation domain name system (DNS) service to define the most suitable entry point in the Pub/Sub architecture. The broker level is responsible for processing and handling messages, while orchestrator level performs load balancing and elasticity among brokers. The scientific contribution of Brokel relies on offering a peak-aware agnostic elastic Pub/Sub model, which can be implemented in any Pub/Sub system in the market, including RabbitMQ (https://www.rabbitmq.com) and Mosquitto (https://mosquitto.org). The peak-aware feature enables Brokel to avoid false-positive VM allocations, avoiding scaling in or out operations when only a load peak is observed.

In the next sections, we present more details about Brokel. First, we introduce related work and then describe Brokel in detail, including its architecture and elasticity decision making. Second, we present the evaluation methodology in terms of client application, test scenarios, and observed metrics. Finally, we present experiment results and final considerations. In particular, in the last part of the article, we emphasize again the scientific contribution of the work also detailing several challenges that we can address in the future.

Related work

This section presents some initiatives that guided us on developing Brokel. Here, we are handling Pub/Sub systems and how they address scalability.

Running in cloud environments, E-STREAMHuB covers a mechanism for automatically expanding and reducing resources for Pub/Sub services with content-based filtering.²¹ The E-STREAMHuB system is an extension of STREAMHuB,¹⁷ a scalable but static Pub/Sub engine. Subscribe messages are sent via unicast to a selected operator, while publish messages are transmitted through broadcast to all operators. E-STREAMHuB introduces data parallelism on Pub/Sub systems, enabling the processing of several publish messages in parallel to registered signatures. However, if the number of publish messages grows quickly, all the solution will be overloaded since all messages are sent to all operations for matching purposes. Elastic Queue Service (EQS)¹⁵ is a message queuing architecture classified as topic-based Pub/Sub, designed to allow elastic scalability, performance, and high availability. Its project concerns the deployment of a Cloud Computing infrastructure using Infrastructure as a Service (IaaS). The EQS elasticity is obtained by migrating topics between processing nodes. If the migration is not enough, the model can instantiate new replicas to meet the growing demand. Migration of threads between nodes can lead to cost and potential service disruption.

Blue Dove²² seeks to deploy attribute-based Pub/Sub services in public cloud environments. Available nodes are divided into regions, which one with its attributes. Subscribe and publish messages are sent to the servers that manage a particular region. Its architecture has two layers. The first concerns Dispatchers Servers, which plays the role of front-end, so directing publishers and subscribers to nodes that manage a given region. The second works as system back-end, processing received messages and forwarding to subscribers when necessary. Its approach only provides attribute-based filtering in addition to not allowing extraction of contents from subscribe messages on the server side. Blue Dove partially supports elasticity, allowing only scaling out operations without supporting any kind of resource release in under-utilization cases. In Wang and Ma,²³ the authors present a content-based, scalable, and elastic Pub/Sub service called general scalable and elastic content-based Pub/Sub service (GSEC). GSEC proposes a framework with two layers, which uses a hybrid space partitioning technique to achieve a high throughput rate, dividing subscriptions into multiple clusters through a hierarchical way. In addition, a helpers-based content distribution technique is proposed to achieve high upload bandwidth, in which servers act as providers and coordinators to fully exploit the system’s upload capability.

X Ma et al.⁵ focus on a hierarchical architecture of servers to provide low latency in event matching of attribute-based Pub/Sub systems. Their solution divides all servers in two specific layers that define their function. Servers in the first layer act as load balancing applying a hash strategy to define to which broker send a request. The brokers act in the second layer processing client requests matching events to subscriptions. In addition, a component named PDetection is in charge of monitoring the requests’ waiting time in the broker layer and performs elasticity actions comparing it with defined thresholds. After an elastic action, PDetection performs all reconfigurations in brokers and load balancers to adjust to the new configurations.

Elastic computing framework (ECF)²⁴ presents an architecture in which programmability of front-end machines is drawn upon to dynamically divide data processing tasks into (1) those that see all of the data but perform simple operations on it; (2) those that see a transformed subset of the data and extract deep insights from it. This results in lifting pressure on network that links the front-end to the back-end and achieves capability adaptive deployment of IoT solutions. In Bellavista and Zanni,²⁵ the authors propose a distributed architecture combining machine-to-machine industry-mature protocols (i.e. Message Queue Telemetry Transport (MQTT) and Constrained Application Protocol (CoAP)) to enhance scalability of gateways for efficient IoT–cloud integration. In addition, the article presents how they have applied the approach in a practical experience of efficiently and effectively extending the implementation of open-source gateway that is available in the industry-oriented Kura framework for IoT. This architecture is used by Bellavista and Zanni²⁶ to propose a scalability solution for IoT gateways via MQTT–CoAP integration. The authors use the hierarchical structure of MQTT to enable an easy way to manage messages. Using CoAP, the solution provides a CoapTreeHandler responsible of dealing with changes in the hierarchy. Through this strategy, it is possible to add or remove entities without needing topics reconfiguration.

Table 1 presents a comparison among the aforementioned research initiatives. We perceive that some academic researches are focusing on providing cloud elasticity toward Pub/Sub systems. However, this combination is an open challenge when it comes to elasticity support for middlewares. Alternatives have to deal with some limitations related to either difficulty to adapt the application to take advantage from elasticity or middleware implementation restrictions. A common strategy present in the literature to transform a non-elastic application in an elastic one requires changes in the application source code or development of additional scripts.^{5,15,21–23,26} It makes the adoption of resource reorganization of cloud-based Pub/Sub applications more difficult, since developers/administrators must have a deep knowledge about both system environment and application code. Moreover, we perceive that elasticity is mainly addressed by solutions employing threshold-based reactive techniques, combining horizontal and replication techniques.^5,15,21–23 Such strategy requires user experience in configuring specific parameters and deploying cloud resources. However, the use of load threshold to guide resource reorganization can cause the problem of VM thrashing, where sporadic peaks exceeding one of the thresholds can result in an undesired elasticity action, being interpreted as a false-positive action. Finally, the main limitation of the state-of-the-art today refers to the elasticity support, which is offered only in a single level.^{5,16,24,27–29} What happens when users have an additional broker and need to inform or rewrite their own applications to deal with this new broker? Or yet, how can users know the best broker to receive the client (publisher or subscriber) requests in order to reduce communication latency in this interaction?

Table 1.

Related work comparison in front of seven relevant Pub/Sub features.

Related work	Filtering	Elasticity	Method	Model	Operation	Modification	Restrictions
E-STREAMHuB²¹	Content	Yes, multi-level	Horizontal and migration	Reactive automatic	Process migration and VM replication	Yes, in the Middleware/Broker	Applies only to STREAMHuB Middleware
EQS¹⁵	Topic	Yes, only on one level	Horizontal and migration	Reactive automatic	Process migration and VM replication	Yes, in the Middleware/Broker	The Middleware must support process migration
Blue Dove²²	Attribute	Only adding resources, multi-level	Horizontal	Reactive automatic	VM replication	Yes, in the Middleware/Broker	Applies only to one Middleware
GSEC²³	Content	Yes, multi-level	Horizontal	Reactive automatic	VM replication	Yes, in the Middleware/Broker	Applies only to one Middleware
ECF²⁴	Content and topic	Yes, only on one level	Horizontal	Reactive automatic	Process migration	Yes, in the Middleware/Broker	Applies only to one Middleware
Bellavista and Zanni²⁵	Topic	No	−	−	−	Yes, in the Middleware/Broker	Applies only to one Middleware
X Ma et al.⁵	Attribute	Yes, only on one level	Horizontal	Reactive automatic	VM replication	Yes, in the Middleware/Broker	Applies only to Matchers
Bellavista and Zanni²⁶	Topic	No	−	−	−	Yes, in the Middleware/Broker	Applies only to one Middleware

VM: virtual machine.

Brokel proposal

Brokel is a multi-level cloud elasticity model for Pub/Sub Brokers. The model also includes Orchestrator replicas, which are responsible for load balancing clients’ requests between the active Broker instances. Brokel offers a two-level elasticity approach providing elasticity for Brokers and Orchestrators. In this section, we present Brokel, including its architecture, application model, and elasticity mechanism. In the next subsections, we discuss details about the model. Therefore, in the first place, we present the design decisions. Also, we describe details about the architecture and model components. Finally, we discuss the application model and the elasticity decisions.

Design decisions

Brokel provides reactive and horizontal elasticity transparently for users, things and applications in both Broker and Orchestrator levels. We adopted horizontal elasticity by applying VM replication since vertical elasticity is limited by the resources available in a particular physical node. In addition, the majority of operating systems do not allow the addition of resources at runtime without a system reboot.^30,31 To develop Brokel, we adopted the following design decisions:

Users do not need to configure the elasticity mechanism; however, they can input elasticity thresholds;

Developers do not need to rewrite their source code to take advantage from elastic resources;

The model is agnostic at the Pub/Sub system viewpoint, since the code of Brokers is encapsulated in VM templates;

The Broker provides authentication to the actors who connect to it;

The cloud architecture considers homogeneous resources, that is, each VM has the same hardware configuration;

Resource provisioning is accomplished through a threshold-based reactive, horizontal, and automatic elasticity mechanism with VM replication;

The elasticity runs in two levels: Broker replication and Orchestrator replication.

Architecture

Brokel operates at the PaaS (Platform as a Service) level of a cloud, acting as a middleware that transforms non-elastic Pub/Sub applications in elastic ones. Figure 2 depicts the main ideas of Brokel, highlighting the communication path for information exchange. The model focuses on enabling Pub/Sub Brokers to take advantage from cloud elasticity without needing source code modifications. In this way, Brokel provides elasticity by allocating and consolidating VMs with replicas of application services. Additionally, the Orchestrator plays an important role in the architecture since it is in charge of balancing clients’ requests among Brokers. At the user’s viewpoint, the Orchestrator acts as a web server processing clients’ requests as a Broker itself. The user does not need to change its application to connect to an Orchestrator since it dispatches the requests to a Broker. Furthermore, aiming at providing a multi-level elasticity model, Brokel also provides elasticity for Orchestrators. The model maintains a DNS server allowing clients to query an appropriate Broker. Thus, the server returns a single address for a particular Orchestrator, so then acting as a load balancer for the Orchestrator level.

Figure 2.

Brokel ideas representing how client requests are distributed among the Orchestrators and Brokers.

To illustrate Brokel architecture, Figure 3 presents its three main components, each one running in a particular VM: (1) Broker, (2) Orchestrator, and (3) Elasticity Manager. The Broker is in charge of processing clients’ requests. Each Broker VM runs a particular message broker system, which is managed by the Pub/Sub provider. Brokel is generic enough to cover all message broker systems. In turn, the Orchestrator is in charge of routing clients’ requests to the active Brokers transparently. Acting as a wrapper, the Orchestrator forwards requests to a suitable Broker and, if applicable, receives the Broker delivery and forwards it to clients. Moreover, elasticity is provided by the Elasticity Manager in both Broker and Orchestrator levels. This component performs all activities related to cloud resources and communication topology reorganizations. Additionally, it also provides the DNS service providing load balance among the Orchestrator VMs.

Figure 3.

The Brokel architecture. While the number of Broker VMs is m, the number of Orchestrator VMs is identified by n. The number of VMs running in the cloud is v, which can be computed by $n + m + 1$ .

Brokel levels

In the Broker level, the system operates as a traditional Pub/Sub system, which means that the provider does not need to change its system behavior. A Broker receives requests from both clients and Orchestrator processes not needing to be aware if the request came from an Orchestrator or a client itself. In this context, for subscribe requests, it just updates its distribution list. On the other hand, for publish requests, the Broker generates a client distribution list and sends the message to these clients. At the Orchestrator level, a particular Orchestrator forwards clients’ requests to Brokers in different ways depending on the type of the request. In particular, the Orchestrator processes send requests and receive data from Brokers as clients. When receiving a publish request, an Orchestrator forwards the message applying a round robin strategy to select one of the active Brokers. We selected this algorithm due to the simplicity and efficiency for load balancing between servers. Nevertheless, when handling a subscribe request, the Orchestrator connects to all Brokers individually and sends the message to them. This strategy is pertinent to guarantee that all Brokers have its subscriber lists updated. Additionally, it enables an one to n communication topology that is not supported by default in a transmission control protocol (TCP) implementation.

Brokel Elasticity Manager

Brokel Manager is in charge not only of DNS services, but also of managing all operations related to resources monitoring and elasticity actions. Therefore, the Manager takes account of the processing load on VMs as input data to elasticity decision making. The Manager executes all cloud reconfigurations by sending requests to the Cloud Front-End, which is in charge to execute all operations related to the cloud environment by receiving requests through the cloud provider application programming interface (API). The Manager observes the cloud resources central processing unit (CPU) occupation levels in constant periods of time evaluating the need of elasticity. To illustrate its functioning, Figure 4 shows the main monitoring phases that the Brokel Manager executes. At each monitoring observation, the Manager collects data from the available resources and evaluates it afterward for elasticity decision making. When scale out operations take place, the new resources require some time to be actually available for the application. This happens due to the process of transferring the VM image to a physical node and bootstrap overhead of the VM operational system. The Brokel Manager is aware of this process delivering the new resources only after they are online in fact. Thus, after starting the allocation process, the Manager continues its execution flow and at the beginning of each monitoring observation, it checks the new resources are online.

Figure 4.

Brokel Manager computation phases.

In particular, after performing an elasticity action, the Manager needs to reorganize the communication topology. To clarify this process, Algorithms 1 and 2 present pseudo-codes with the main operations the Manager does in each cloud elasticity moment. Algorithm 1 presents the pseudo-code when a scaling out operation takes place. When adding a new Broker VM (line 4), the Elasticity Manager configures this new Broker with the current subscribers list (line 5). Specifically, line 8 of the algorithm represents the operation where the Elasticity Manager reconfigures all active Orchestrator processes including a Broker in its lists. In case of adding a new Orchestrator (line 12), the Elasticity Manager configures the list of Brokers in this new Orchestrator (line 13) and updates the DNS service (line 14). However, instead of connecting with external VMs, when updating the DNS the Manager only needs to reconfigure its own DNS service.

Algorithm 1. Pseudo-code of a scaling out operation that the Elasticity Manager performs.
1 Input: Scaling out operation
2 Output: New resource
3 If (add_broker() then
4 add_brokerVM(cloud_frontend);
5 copy_subscribers_to(new_broker);
6 for (i = 0; i < amount_of_orchestrator; i++) do
7 connect_to(orchestrator[i]);
8 config(orchestrator[i], new_broker);
9 disconnect();
10 end for
11 else
12 add_orchestratorVM(cloud_frontend);
13 config_broker_list(new_orchestrator);
14 add_dns(new_orchestrator);
15 end if

Algorithm 2. Pseudo-code of a scaling in operation that the Elasticity Manager performs.
1 Input: Scaling in operation
2 Output: Remove resource
3 if (remove_broker() then
4 for (i = 0; i < amount_of_orchestrator; i++) do
5 connect_to(orchestrator[i]);
6 remove_broker(orchestrator[i], broker);
7 disconnect();
8 end for
9 while broker is processing request do
10 wait()
11 end while
12 remove_VM(cloud_frontend, broker);
13 notify_orchestrators();
14 else
15 remove_dns(orchestrator);
16 while orchestrator is processing request do
17 wait();
18 end while
19 remove_VM(cloud_frontend, orchestrator)
20 end if

When removing either a Broker or an Orchestrator, the Manager must ensure that the target VM is neither receiving nor processing clients’ requests before performing the scale in operation. Thus, the Manager executes a collection of operations to guarantee it. Algorithm 2 presents the pseudo-code when a scaling in operation takes place showing these operations. When removing a Broker VM, the Manager executes four steps. First, it configures all Orchestrators to not send requests to the Broker that will be removed (line 6). Second, it waits until the target Broker be idle and not processing client requests (line 9). This step is important to guarantee that clients’ requests will not be lost. In the next step, the Manager removes the Broker VM from the cloud (line 12). Finally, it notifies all Orchestrators to remove completely the Broker from its lists (line 13). In case of removing an Orchestrator, the Manager performs similar operations. First, it removes the Orchestrator from the DNS service (line 15) in order to guarantee that new client request will not be processed by this particular Orchestrator. Following, as when removing a Broker, the Manager waits until the Orchestrator finishes clients’ requests (line 16). Thus, only after that, it removes the Orchestrator VM from the cloud (line 19).

Application model

Brokel offers a two-level elasticity approach for both Brokers and Orchestrators. Thus, many Brokers and Orchestrator are available to process clients’ requests. Consequently, the flow of these requests is different depending on its type (publish or subscribe). Therefore, Figure 5 shows the request processing flowchart when a client request arrives. First, a client requests an Orchestrator to the DNS service that selects one of the active Orchestrator replicas employing a load balance algorithm. Next, the client sends the message to the Orchestrator which analyzes it to discover the request type. Depending on this type, the Orchestrator acts in two different ways. If the message is a subscribe request, the Orchestrator stores this information and forwards the request to all active Brokers. The process of store information by Orchestrator is important in situations where Brokel elasticity mechanism includes new Brokers. All new Broker needs to be updated of all subscriptions before it can operate properly. On the other hand, if the message is a publish request, Orchestrator applies the load balance algorithm to select the most suitable Broker to forward the message. This strategy is pertinent to avoid that two different Brokers process the same client request. Also, it increases the model processing capacity.

Figure 5.

Message processing flowchart.

In the last phase of the flowchart, the message arrives to a Broker. Then, the Broker analyzes its type and, if the message is a subscribe request, the Broker just stores this information in its table and the flow ends. However, if the message is a publish request, the Broker engine generates the client distribution list and sends it to the Orchestrator. Finally, the Orchestrator sends the message to all clients and the flow ends.

Elasticity decisions

When considering the task of monitoring resources, cloud platforms typically use metrics from the operating system to determine the node workload. In this context, CPU load in percentage refers to the most used metric particularly pertinent for CPU scavating applications.^29,32,33 Thus, Brokel Manager monitors CPU load of each VM periodically and computes the load for each level. If resource reorganization is necessary, the Manager proceeds elasticity actions using the cloud API. When completing elasticity verification and action tasks, the Brokel Manager ends a monitoring observation and waits for the next monitoring cycle.

One role of the Manager is to generate the system load (l) in order to minimize the effect of disturbances or noises on the behavior of the target level. Thus, we are working with time-series and simple exponential smoothing (SES),³⁴ also referred as weighted moving average³⁵—technique over the CPU load metric of each VM. Equation (1) presents $l (o)$ as the level load at the $o th$ monitoring observation considering n active VMs on the considered level. This equation is an arithmetic average of the load on each VM, which is computed through $l' (v, o)$ . Here, v is a VM index, o is the current monitoring observation, and n the number of VMs running application processes of one of the two levels (see equation (2)).

l (o) = \frac{\sum_{v = 0}^{n - 1} l' (v, o)}{n}

(1)

l' (v, o) = {\begin{matrix} \frac{cpu (v, o)}{2} & if o = 0 \\ \frac{l' (v, o - 1)}{2} + \frac{cpu (v, o)}{2} & if o \neq 0 \end{matrix}

(2)

$l'$ consists in a SES average, where the weight of the current observation o has a stronger influence than $o - 1$ in the final calculus (starting from $1 / 2$ , we are using $1 / 4$ , $1 / 8$ , and so on for the weights). The recurrence ends in the $cpu (v, o)$ computation, which returns the CPU load of VM v at observation o. The use of this smoothing technique is pertinent to avoid VM trashing,²⁸ that is, false-positive or false-negative operations on VM scaling in our scaling out. A false-positive situation happens when an elasticity action is triggered after detecting a sudden load peak that exceeds some threshold. We can see false-negative scenarios mainly on elasticity situations that trigger resource reorganization when finding one of two conditions: after exceeding thresholds either for a predefined number of consecutive observations or for a predefined time. On both situations, elasticity can be pertinent to adjust the load; however, it does not occur or it is delayed.

Evaluation methodology

In this section, we present methodological aspects related to the Brokel evaluation, starting from the application. Next, we discuss both infrastructure and evaluation scenarios. Finally, we show the metrics used to evaluate the experiments in the proposed scenarios.

Application prototype

To evaluate Brokel, we designed two different applications: (1) a Publisher application to produce and send messages to our platform and (2) a Subscriber application to receive these messages. The Publisher receives an input file containing a list of messages to produce for a specific topic. In addition, each line of the file contains three parameters: (1) target topic, (2) data field 1, and (3) data field 2. To represent how Publisher works, Algorithm 3 shows it as a loop processing each line of the file separately. Basically, each line of the file results in an operation of publication of a message. On the other hand, the Subscriber receives the same file, processing it differently. The application reads each line of the file, retrieves the target topic, and sends a subscribe request to the server. To illustrate this process, Algorithm 4 shows the Subscriber functioning. Regarding optimizations, a topic receives a subscription only once.

Algorithm 3. Pseudo-code of the Publisher.
Data: Input file
Result: Publish messages
1 initialization;
2 for eachline in file do
3 topic = $get_param$ (line,1);
4 data1 = $get_param$ (line,2);
5 data2 = $get_param$ (line,3);
6 message = concat(data1, data2);
7 publish (topic, message, server_address);
8 end for

Algorithm 4. Pseudo-code of the Subscriber.
Data: Input file
Result: Topics subscriptions
1 initialization;
2 for each line in file do
3 topic = $get_param$ (line,1);
4 if $find_topic$ (topic, topicList) == 0 then
5 $add_topic$ (topic, topicList);
6 subscribe(topic, server_address);
7 end if
8 end for

To simulate data input to our model, we used a data set³⁶ with over 21 million records. Each record describes the precise location of one of the 316 taxis of Rome with the following structure: taxi identification, location, timestamp (date/time). The Publisher application uses the identification field as a topic and the location and timestamp as the message body. The data represent all data gathered between 1 February 2104 and 2 March 2014.

Elasticity manager prototype and cloud infrastructure

We implemented a Brokel prototype for private clouds using OpenNebula (https://opennebula.org/) version 4.12.1. The Brokel Elasticity Manager was coded in Java and it uses the Java-based OpenNebula API for both monitoring and elasticity activities. Two image templates for the VMs were provided: one for the Orchestrators and another for the Brokers. The grain of an elasticity action is always a single VM, so we have an increase or decrease of 1 Orchestrator or Broker at every resource reorganization. Moreover, as presented in Sladescu and Fekete,¹⁴ we used the interval of 15 s to configure the Brokel Manager periodic monitoring activity, which is associated with the OpenNebula lowest bound index for periodical VM monitoring.

Considering the cloud infrastructure, our cloud is composed by 11 (1 Front-End and 10 nodes) homogeneous 2.9 GHz dual core nodes, each one with 4 GB of RAM memory, and an interconnection network of 100 Mbps. These nodes are configured with the Operational System Ubuntu Server 14.04 LTS. Besides, we configured in the cloud Front-End the shared data area to store application files and VM images. Finally, our DNS was deployed using the BIND (https://www.isc.org/downloads/bind/) DNS server version 9.9.5.

Evaluation scenarios

To evaluate our solution, we designed a set of scenarios with different configurations. We are considering strategies which support either single or multi-level elasticity. In addition, we are also evaluating the performance of our application prototype as a basis of comparison among the different scenarios. Particularly, we are basing this strategy on the work of Barazzutti et al.,²¹ which first evaluated an application without elasticity support as a baseline. Thus, aiming at analyzing the impact of different configurations in the application performance, we defined three scenarios: (s1) running the application with the lowest number of VMs and disabling the elasticity feature, (s2) running the application starting with the lowest number of VMs and enabling elasticity in the Broker level, and (s3) running the application starting with the lower number of VMs as possible and enabling the multi-level elasticity. For all executions in these scenarios, the application started with one single VM per stage. In scenario s1, the goal is to obtain the results from the execution of the application without considering elasticity. In this scenario, the results represent traditional approaches, in which the application runs with the same amount of resources in the entire execution. On the other hand, in scenario s2, we aim to evaluate our model as a single-level elasticity solution. So, in this scenario, we reorganized resources only in the Broker level. Finally, in scenario s3, we intent to evaluate our model enabling elasticity in both Broker and Orchestrator levels. This is the scenario that characterizes our model as a multi-level elasticity solution.

Regarding scenario s2, we used two Broker environments to model three subscenarios: (s2.1) VM replicas running Mosquitto Broker, (s2.2) VM replicas running RabbitMQ Broker, and (s2.3) 50% of VM replicas running Mosquitto Broker and 50% of VM replicas running RabbitMQ Broker. Furthermore, in these subscenarios, we first executed the application setting the cloud service level agreement (SLA) to allow a maximum of 6 VMs in the cloud. Later on, we run again the application in these scenarios setting the cloud SLA to allow a maximum of 10 VMs. Concerning elasticity parameters, 70% and 90% were used for the upper threshold ( $t_{u}$ ) whereas 30% and 50% were used for the lower threshold ( $t_{l}$ ). The combination of these thresholds was used to guide the elasticity in executions from scenarios s2 and s3. In summary, there are 30 different executions when considering all scenarios and thresholds. Additionally, Figure 6 presents all experiment scenarios and used parameters.

Figure 6.

Evaluation scenarios: without elasticity in s1 and with elasticity in s2 and s3.

To generate input load for scenarios s1 and s2, we instantiated 20 publisher processes and divided among them the first 400,000 records of our data set. In addition, we also instantiated 20 subscriber processes to receive these data. In scenario s3, as the elasticity multi-level is enable, we doubled the input data to generate a higher throughput and, consequently, increase the load. In this way, we instantiated in scenario s3 40 publisher processes and divided among them the first 800,000 records of our data set. Furthermore, we also instantiated 40 subscriber processes to receive these data.

Evaluation metrics

Focusing on analyzing performance and resource consumption, our evaluation analyzes the aforementionecenarios against four metrics: time, $messages persecond$ , cost, and efficiency. The first represents the performance perspective, referring to the application execution time. The second is analogous to the throughput of our solution. This metric is calculated by dividing the total messages sent to the system by the total time of execution. The cost metric, in turn, represents the amount of resources the application used along the total execution. To estimate the application cost, it is necessary to know the each VM deployment time. In equation (3), s means the maximum number of allocated VMs and $p t_{e} (j)$ is the time spent when running a configuration with j VMs. For example, consider the situation: 10 s with 2 VMs, 60 s with 4 VMs, 50 s with 6 VMs, and 40 s with 4 VMs; here, we have $cost = 10 \times 2 + (60 \times 4 + 40 \times 4) + 50 \times 6 = 720$ . Finally, efficiency represents the occupation rates of the allocated resources along the execution. In equation (4), s and $p t_{e} (j)$ have the same meaning described in equation (3).

cost = \sum_{j = 1}^{s} j \times p t_{e} (j)

(3)

efficiency = \sum_{j = 1}^{s} (\frac{j \times p t_{e} (j)}{time})

(4)

Results

This section presents the results, which were organized in five subsections for better exploring each perspective of cloud elasticity applied to Pub/Sub systems.

Application time

Figure 7 illustrates the time spent on handling the workload on each environment configuration. The y-axis refers to the type of Broker used and the replica boundary, while the x-axis corresponds to the time measured in seconds in which a bar refers to a threshold setting. In particular, the first value is the lower threshold ( $t_{l}$ ), while the second refers to the upper threshold ( $t_{u}$ ). For example, 50/90 refers to a lower threshold of 50% and a upper threshold of 90%. The environment with Mosquitto broker and 1 VM obtained the best result among executions as evaluation scenario s1 (without elasticity). This setting took 7,280 s (121.3 m) to be finalized. In this case, no load limits were used because cloud elasticity was disabled. In addition, RabbitMQ with 1 VM obtained a conclusion time of 10,520 s (175.3 m), so being 44.5% slower than the Mosquitto environment.

Figure 7.

Application execution time of all scenarios and configurations.

The percentage of reduction in execution time using Brokel is shown in Figure 8. In this figure, we present a comparison of Mosquitto with 6 VMs and 10 VMs against the deployment using a single VM. Moreover, the same was conducted for the RabbitMQ broker. We can perceive that Brokel reduced the time required for handling the workload in environments running the Mosquitto broker by up to 76.6%. The advantage was even greater when running with RabbitMQ, reaching 81.2% of time reduction. In general, the best result obtained using elasticity was with Mosquitto (10 VMs) when using $t_{l}$ and $t_{u}$ equal to 30% and 70%, respectively. This configuration was responsible for a conclusion time of 1,700 s. On the other hand, the worst case was obtained in the RabbitMQ (6 VMs) environment using 50% for $t_{l}$ and 90% for $t_{u}$ . The conclusion time here was 2,600 s. The upper threshold equal to 70% was responsible for allocating VMs faster, since the system does not execute with a load greater than this limit. Consequently, the larger the number of VMs, the shorter the execution time since we have a CPU-intensive incoming demand.

Figure 8.

Percentage of reduction in execution time.

Resource allocation

Table 2 shows the utilization time of the virtual resources on each environment configuration. For example, the Mosquitto environment with a limit of 10 VMs using $t_{l}$ equal to 50% and $t_{u}$ equal to 90% used 2 VMs for 160 s, 4 VMs for 440 s, 6 VMs for 180 s, 8 VMs for 200 s, and 10 VMs for 900 s, then totaling 1880 s of execution. Figure 9 displays the utilization time of the virtual resources in each environment configuration. Here, the y-axis refers to the type of Broker, the limit of replicas, and thresholds used. The x-axis, in its turn, corresponds to the time addressed in seconds and each stacked bar of the graph corresponds to the amount of occupied resources.

Table 2.

Time in seconds the application run with each resource configuration set and scenarios.

Platform	Limit	Threshold ( $t_{l}$ / $t_{u}$ )	Virtual machines (VMs)						Total (s)
Platform	Limit	Threshold ( $t_{l}$ / $t_{u}$ )	1	2	4	6	8	10	Total (s)
Mosquitto	1 VM	−	7,280	−	−	−	−	−	7,280
	6 VMs	30/70	−	180	160	1,600	−	−	1,940
		30/90	−	300	200	1,720	−	−	2,220
		50/70	−	240	240	1,600	−	−	2,080
		50/90	−	220	300	1,440	−	−	1,960
	10 VMs	30/70	−	120	220	240	220	900	1,700
		30/90	−	200	180	200	180	960	1,720
		50/70	−	180	160	160	160	1,120	1,780
		50/90	−	160	440	180	200	900	1,880
RabbitMQ	1 VM	−	10,520	−	−	−	−	−	10,520
	6 VMs	30/70	−	140	160	2,200	−	−	2,500
		30/90	−	200	180	2,060	−	−	2,440
		50/70	−	300	160	1,980	−	−	2,440
		50/90	−	380	200	2,020	−	−	2,600
	10 VMs	30/70	−	120	160	160	200	1,340	1,980
		30/90	−	160	220	160	200	1,480	2,220
		50/70	−	380	240	160	160	1,300	2,240
		50/90	−	220	180	160	200	1,280	2,040
Mix	6 VMs	30/70	−	160	160	1,980	−	−	2,300
		30/90	−	400	180	1,880	−	−	2,460
		50/70	−	220	240	1,860	−	−	2,320
		50/90	−	640	220	1,720	−	−	2,580
	10 VMs	30/70	−	180	160	160	160	1,340	2,000
		30/90	−	300	200	260	220	1,140	2,120
		50/70	−	520	160	160	220	1,100	2,160
		50/90	−	300	180	180	200	1,120	1,980

Figure 9.

Profile of VM configuration along the application execution in each scenario and configuration.

Figure 10 depicts the state of the thresholds at each monitoring point. The y-axis refers to the type of Broker, the allowed replica limit, and thresholds used in the experiments. The x-axis corresponds to the number of collections performed and each stacked bar of the graph means the state of the thresholds. Mosquito obtained 39 readings above $t_{u}$ in the following deployments: (i) Mosquitto with 10 VMs using $t_{l}$ of 50% and $t_{u}$ of 90%; (ii) Mix with 10 VMs using $t_{l}$ of 30% and $t_{u}$ of 90%; and (iii) Mix with 10 VMs using $t_{l}$ of 50% and $t_{u}$ of 90%. This index refers to the smallest number recorded in the experiments. In these settings, we the application spent less overloaded time. On the other hand, the environment with the greatest overhead was RabbitMQ (6 VMs) with $t_{l}$ of 50% and $t_{u}$ of 90%, registering 98 readings above $t_{u}$ .

Figure 10.

Load status in all monitoring observations.

Throughput analysis

Figure 11 illustrates the rate of messages per second handled at each environment setting. The y-axis refers to the type of broker used and the allowed replica limits. The x-axis corresponds to the number of messages per second (Mps) and each bar of the graph means a threshold setting used. It is important to highlight that the messages have the same size for all executions.

Figure 11.

Messages per second handled by the application.

Mosquitto (1 VM) obtained the best result when considering the concurrents inside scenario s1 (without elasticity), reaching a rate of 54.95 Mps. RabbitMQ (1 VM) environment achieved a performance of 38.02 Mps, then 30.8% lower than the previous one. Using the limit of six replicas of VMs, the best result was obtained by Mosquitto (6 VMs) environment using $t_{l}$ of 30% and $t_{u}$ of 70%, so obtaining a value of 206.19 Mps. The worst case was obtained in the RabbitMQ (6 VMs) environment using 50% and 90% for $t_{l}$ and $t_{u}$ , respectively, resulting in a performance index of 153.85 Mps.

In particular, with the replica creation limit set to 10 VMs, 235.29 Mps were treated in the Mosquitto (10 VMs) environment using $t_{l}$ of 30% and $t_{u}$ of 70%. The worst result was recorded in the RabbitMQ (10 VMs) environment, where 178.57 Mps were addressed using $t_{l}$ of 50% and $t_{u}$ of 70%, so being 24.1% shorter than the Mosquitto (10 VMs) environment. In general, the best result with elasticity appeared when employing Mosquitto (10 VMs) with 30% and 70% for the thresholds, obtaining then 235.29 Mps. The worst case was obtained in the RabbitMQ (6 VMs) using $t_{l}$ of 50% and $t_{u}$ of 90%, resulting at 153.85 Mps.

Cost and efficiency

In Table 3, we present the results regarding the cost of each execution. The cost computation corresponds to the total number of resources used during the execution of the experiment. Thus, the idea is to minimize the cost. Figure 12 depicts the information presented in Table 3. Here, the y-axis refers to the type of broker used and the allowed replica limit. In its turn, the x-axis corresponds to the cost value where each bar of the graph represents a threshold setting used.

Table 3.

Cost results of all evaluation scenarios.

Platform	Resources	Thresholds ( $t_{l}$ / $t_{u}$ )
Platform	Resources	–	30/70	30/90	50/70	50/90
Mosquitto	1 VM	7,280	−	−	−	−
	6 VMs	−	10,600	11,720	11,040	10,280
	10 VMs	−	13,320	13,360	14,440	13,760
RabbitMQ	1 VM	10,520	−	−	−	−
	6 VMs	−	14,120	13,480	13,120	13,680
	10 VMs	−	16,840	18,560	16,960	16,520
Mix	6 VMs	−	12,840	12,800	12,560	12,480
Mix	10 VMs	−	16,640	16,120	15,400	15,200

VM: virtual machine.

Figure 12.

Total cost to execute the application in all scenarios.

The best result associated with the cost was obtained by the Mosquitto (6 VMs) environment with $t_{l}$ equal to 50% and $t_{u}$ equal to 90%, so using 10,280 resources to complete the execution. The worst case was perceived in the RabbitMQ (10 VMs) with 30% and 90% for the thresholds, totaling 18,560 of resources during execution. Table 4 in turn presents the results of the efficiency calculation of each evaluated environment. The efficiency corresponds to the amount of resources (VMs) used in each second of execution. Thus, the idea is to minimize the number of resources to maximize the system efficiency. The best result was obtained by the Mosquitto (6 VMs) environment with $t_{l}$ of 50% and $t_{u}$ of 90%, resulting in an average of 5.245 VMs in each second of execution. The worst case was obtained in the RabbitMQ (10 VMs) environment with 30% and 70% for the thresholds, where an average of 8.505 VMs in each second was perceived.

Table 4.

Efficiency results of all evaluation scenarios.

Platform	Resources	Thresholds ( $t_{l}$ / $t_{u}$ )
Platform	Resources	–	30/70	30/90	50/70	50/90
Mosquitto	1 VM	1.00	−	−	−	−
	6 VMs	−	5.464	5.279	5.308	5.245
	10 VMs	−	7.835	7.767	8.112	7.319
RabbitMQ	1 VM	1.00	−	−	−	−
	6 VMs	−	5.648	5.525	5.377	5.262
	10 VMs	−	8.505	8.360	7.571	8.098
Mix	6 VMs	−	5.583	5.203	5.414	4.837
Mix	10 VMs	−	8.320	7.604	7.130	7.677

VM: virtual machine.

Multi-level resource allocation

The main objective of scenario s3 is to evaluate the elastic performance in the Orchestrator layer. To accomplish this, two replicas of Orchestrator VMs were used, as well as three Mosquitto Broker replicas and other three broker replicas for RabbitMQ. Table 5 shows the utilization time of the virtual resources in each environment configuration. For example, the broker setup with a limit of 6 VMs using $t_{l}$ equal to 50% and $t_{u}$ equal to 90% used 2 VMs for 1900 s, 4 VMs for 280 s, 6 VMs for 3040 s, totaling 5220 s of execution. During the same experiment, the Orchestrator layer used 1 VM for 4120 s and 2 VMs for 1100 s, totaling the same 5220 s of execution. Figure 13 corresponds to the resource allocation graph of the experiments in scenario s3.

Table 5.

Time the application run with each resource configuration set in the multi-level scenario.

Level	Limit	Thresholds ( $t_{l}$ / $t_{u}$ )	Virtual machines (VMs)				Total (s)
Level	Limit	Thresholds ( $t_{l}$ / $t_{u}$ )	1	2	4	6	Total (s)
Broker	6 VMs	30/70	−	100	1,380	3,140	4,620
		30/90	−	200	1,400	3,320	4,920
		50/70	−	1,260	340	3,040	4,640
		50/90	−	1900	280	3,040	5,220
Orchestrator	2 VMs	30/70	360	4,260	−	−	4,620
		30/90	480	4,440	−	−	4,920
		50/70	3,200	1,440	−	−	4,640
		50/90	4,120	1,100	−	−	5,220

Figure 13.

Profile of resource allocation in scenario s3.

Figure 14 shows a graph of CPU utilization in the Orchestrator viewpoint when running scenario s3 with 30% and 70% for thresholds. Here, the blue line represents the total CPU allocated at the time of data collecting, while the red line refers to the CPU usage. Figure 15 presents the same information as Figure 14, but referring to scenario s3 with 50% and 70% for thresholds. These two figures were selected to demonstrate the impact that thresholds setting can cause to the environment with elasticity.

Figure 14.

Resource utilization and allocation in scenario s3 with thresholds $t_{l}$ of 30% and $t_{u}$ of 70%.

Figure 15.

Resource utilization and allocation in scenario s3 with thresholds $t_{l}$ of 50% and $t_{u}$ of 70%.

In the experiment represented by Figure 15, the second Orchestrator replica is instantiated around observation 13. This action is performed because the total processing of the environment exceeds the value of $t_{u}$ . The environment is maintained with two replicas until finishing the experiment. During the execution shown by Figure 15, the second Orchestrator replica is also instantiated around observation 13. However, the second replica is deactivated soon after its creation, as the total processing of the environment drops below $t_{l}$ . The second replica activation and deactivation actions are repeated until the end of the experiment. This behavior was perceived when using a single replica above of $t_{u}$ and with two replicas below of $t_{l}$ .

Conclusion

This article presented Brokel, a multi-level elasticity model for Pub/Sub brokers, so enabling a performance-driven communication bus that integrates applications, things, and users. Our idea is to address the growing IoT demand in an effortless way at clients viewpoint, enabling them to perceive an acceptable performance level when using the Pub/Sub system regardless of load and time of day. We introduced two elasticity levels: Broker and Orchestrator, in which the latter acts as a load balancing wrapper without needing to change the existing Pub/Sub applications. Cloud elasticity is also offered as an agnostic service to brokers, that is, we only need to create VM templates for each target broker, allowing Brokel to work with any Pub/Sub broker (such as RabbitMQ, Mosquitto). Our scientific contribution relies on providing horizontal elasticity at both Orchestrator and Broker levels, adding the possibility of using a geolocation DNS service to define the most suitable entry point (Orchestrator) to the messaging system. Another Brokel feature that we did not observe in Pub/Sub state-of-the-art refers to the Aging method on evaluating the load of both aforesaid levels, then avoiding VM thrashing on elasticity actions.

To the best of our knowledge, there is no solution that proposes cloud elasticity in Broker and Orchestrator levels. Thus, we compared our solution with different configurations and non-elastic scenarios. Our results are encouraging and serve as proof of concept for the use of elasticity in Pub/Sub systems. We evaluated Brokel by developing a prototype and a client application, testing both against different scenarios, brokers, and metrics. Comparing elastic and non-elastic environments, we highlight performance gains of 76.6% and 81.2% when using Mosquitto and RabbitMQ brokers, respectively. At messages per second (Mps) viewpoint, we also get throughput benefits: from 54.95 to 235.29 Mps for Mosquitto and from 38.02 to 202.02 Mps for RabbitMQ. When exploiting the resource metric perspective, the best result was obtained when averaging 5.24 VMs per second during the execution, with lower threshold of 50% and upper threshold of 90%. These threshold values are in charge of postponing resource allocation, since new resources will only be instantiated after crossing 90% of system load. On the other hand, the worst results in the resource perspective happened when using 30% and 70% for the lower and upper thresholds, revealing a value of 8.51 VMs. In the elasticity scope, it is pertinent to observe that execution time and resource allocation are inversely proportional metrics.

Future research includes the evaluation of Broker on real environments. We are studying the possibility to assess it in post office agencies. Furthermore, future studies include additional evaluation of research strategies employing our solution. Additionally, we intend to improve network performance investigating event notification strategies as proposed in Scalable Internet Event Notification Architectures (SIENA) (http://www.inf.usi.ch/carzaniga/siena/).^6,7 Although presenting encouraging results, the current version of Brokel uses reactive elasticity where lower and upper thresholds must be informed beforehand. Thus, we plan to explore proactive elasticity in order to drive elasticity actions in advance, therefore delivering the resources before entering in an over- or under-provisioning state.

Footnotes

Academic Editor: Sang-Woon Jeon

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the following Brazilian agencies: CNPq, FAPERGS, and CAPES. The work was also partially supported by Finep, with resources from Funttel, grant no. 01.14.0231.00, under the Radiocommunication Reference Center (Centro de Referência em Radiocomunicações (CRR)) project of the Inatel, Brazil.

References

Gubbi

Buyya

Marusic

et al . Internet of Things (IoT): a vision, architectural elements, and future directions. Future Gener Comp Sy 2013; 29(7): 1645–1660.

Ngu

Gutierrez

Metsis

et al . IoT middleware: a survey on issues and enabling technologies. IEEE Internet Things 2017; 4(1): 1–20.

Santos

Rodrigues

JJPC

Casal

et al . Intelligent personal assistants based on internet of things approaches. IEEE Syst J. Epub ahead of print 19 May 2016. DOI: 10.1109/JSYST.2016.2555292.

Yao

Sheng

Dustdar

Web-based management of the internet of things. IEEE Internet Comput 2015; 19(4): 60–67.

Wang

Qiu

et al . Scalable and elastic event matching for attribute-based publish/subscribe systems. Future Gener Comp Sy 2014; 36: 102–119 (Special section: intelligent big data processing special section: behavior data security issues in network information propagation special section: energy-efficiency in large distributed computing architectures special section: eScience infrastructure and applications).

Tarkoma

. Scalable internet event notification architecture (SIENA). In: Proceedings of the research seminar on middleware for mobile computing, Helsinki, 20 February 2002. Helsinki: Department of Computer Science, University of Helsinki.

Carzaniga

Rosenblum

Wolf

AL.

Design and evaluation of a wide-area event notification service. ACM T Comput Syst 2001; 19(3): 332–383.

Botta

de Donato

Persico

et al . Integration of cloud computing and internet of things: a survey. Future Gener Comp Sy 2016; 56: 684–700.

Duran-Limon

Siller

Blair

et al . Using lightweight virtual machines to achieve resource adaptation in middleware. IET Softw 2011; 5(2): 229–237.

10.

Giang

Blackstock

Lea

et al . Developing IoT applications in the fog: a distributed dataflow approach. In: Proceedings of the 2015 5th international conference on the Internet of Things (IOT), Seoul, South Korea, 26–28 October 2015, pp.155–162. New York: IEEE.

11.

Truong

Dustdar

Principles for engineering IoT cloud systems. IEEE Cloud Comput 2015; 2(2): 68–76.

12.

Robertazzi

TG.

Chapter 10: grids, clouds, and data centers. In: Robertazzi

(ed.) Introduction to computer networking. Cham: Springer International Publishing, 2017, pp.113–127.

13.

Chilipirea

Constantin

Popa

et al . Cloud elasticity: going beyond demand as user load. In: Proceedings of the 3rd international workshop on adaptive resource management and scheduling for cloud computing (ARMS-CC’16), Chicago, IL, 25–28 July 2016, pp.46–51. New York: ACM.

14.

Sladescu

Fekete

Event aware elasticity control for cloud applications. Technical report 687, April 2012. Sydney, NSW, Australia: The University of Sydney.

15.

Tran

Skhiri

Zimányi

. EQS: an elastic and scalable message queue for the cloud. In: Proceedings of the 2011 IEEE 3rd international conference on cloud computing technology and science (CloudCom), Athens, 29 November–1 December 2011, pp.391–398. New York: IEEE.

16.

Moore

Bean

Ellahi

Transforming reactive auto-scaling into proactive auto-scaling. In: Proceedings of the 3rd international workshop on cloud data and platforms (CloudDP’13), Prague, 14 April 2013, pp.7–12. New York: ACM.

17.

Barazzutti

Felber

Fetzer

et al . StreamHub: a massively parallel architecture for high-performance content-based publish/subscribe. In: Proceedings of the 7th ACM international conference on distributed event-based systems, Arlington, TX, 29 June–3 July 2013, pp.63–74. New York: ACM.

18.

Carella

Magedanz

Open Baton: a framework for virtual network function management and orchestration for emerging software-based 5G networks. Newsletter, 2016, http://resourcecenter.fd.ieee.org/fd/product/enewsletters/FDSDNNL0005

19.

Wang

Liu

et al . LiveCloud: a lucid orchestrator for cloud datacenters. In: Proceedings of the 4th IEEE international conference on cloud computing technology and science, Taipei, Taiwan, 3–6 December 2012, pp.341–348. New York: IEEE.

20.

Pham

Tchana

Donsez

et al . Roboconf: a hybrid cloud orchestrator to deploy complex applications. In: Proceedings of the 2015 IEEE 8th international conference on cloud computing, New York, 27 June–2 July 2015, pp.365–372. New York: IEEE.

21.

Barazzutti

Heinze

Martin

et al . Elastic scaling of a high-throughput content-based publish/subscribe engine. In: Proceedings of the 2014 IEEE 34th international conference on distributed computing systems (ICDCS), Madrid, 30 June–3 July 2014, pp.567–576. New York: IEEE.

22.

Kim

et al . A scalable and elastic publish/subscribe service. In: Proceedings of the 2011 IEEE international parallel distributed processing symposium (IPDPS), Anchorage, AK, 16–20 May 2011, pp.1254–1265. New York: IEEE.

23.

Wang

A general scalable and elastic content-based publish/subscribe service. IEEE T Parall Distr 2015; 26(8): 2100–2113.

24.

Zhong

Doshi

et al . Capability adaptive elastic IoT architecture. In: Proceedings of the 2015 IEEE international conference on smart city/SocialCom/SustainCom (SmartCity), Chengdu, China, 19–21 December 2015, pp.615–622. New York: IEEE.

25.

Bellavista

Zanni

. Towards better scalability for IoT-cloud interactions via combined exploitation of MQTT and CoAP. In: Proceedings of the 2016 IEEE 2nd international forum on research and technologies for society and industry leveraging a better tomorrow (RTSI), Bologna, 7–9 September 2016, pp.1–6. New York: IEEE.

26.

Bellavista

Zanni

. Scalability of Kura-extended gateways via MQTT-CoAP integration and hierarchical optimizations. In: Proceedings of the 11th EAI international conference on body area networks (BodyNets’16), Turin, 15–16 December 2016, pp.210–216. Brussels: Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering (ICST).

27.

Ali-Eldin

Tordsson

Elmroth

. An adaptive hybrid elasticity controller for cloud infrastructures. In: Proceedings of the 2012 IEEE network operations and management symposium (NOMS), Maui, HI, 16–20 April 2012, pp.204–212. New York: IEEE.

28.

Bryant

Tumanov

Irzak

et al . Kaleidoscope: cloud micro-elasticity via VM state coloring. In: Proceedings of the 6th conference on computer systems (EuroSys’11), Salzburg, 10–13 April 2011, pp.273–286. New York: ACM.

29.

Nikolov

Kächele

Hauck

et al . CLOUDFARM: an elastic cloud platform with flexible and adaptive resource management. In: Proceedings of the 2014 IEEE/ACM 7th international conference on utility and cloud computing (UCC’14), London, 8–11 December 2014, pp.547–553. Washington, DC: IEEE Computer Society.

30.

Dutta

Gera

Verma

et al . SmartScale: automatic application scaling in enterprise clouds. In: Proceedings of the 2012 IEEE 5th international conference on cloud computing (CLOUD), Honolulu, HI, 24–29 June 2012, pp.221–228. New York: IEEE.

31.

Lorido-Botran

Miguel-Alonso

Lozano

A review of auto-scaling techniques for elastic applications in cloud environments. J Grid Comput 2014; 12(4): 559–592.

32.

Herbst

Kounev

Weber

et al . BUNGEE: an elasticity benchmark for self-adaptive IAAS cloud environments. In: Proceedings of the 10th international symposium on software engineering for adaptive and self-managing systems (SEAMS’15), Florence, 18–19 May 2015, pp.46–56. Piscataway, NJ: IEEE Press.

33.

Lim

Babu

Chase

et al . Automated control in cloud computing: challenges and opportunities. In: Proceedings of the 1st workshop on automated control for datacenters and clouds (ACDC’09), Barcelona, 19 June 2009, pp.13–18. New York: ACM.

34.

Herbst

Huber

Kounev

et al . Self-adaptive workload classification and forecasting for proactive resource provisioning. In: Proceedings of the 4th ACM/SPEC international conference on performance engineering (ICPE’13), Prague, 21–24 April 2013, pp.187–198. New York: ACM.

35.

Yazdanov

Fetzer

. Vertical scaling for prioritized VMs provisioning. In: Proceedings of the 2012 2nd international conference on cloud and green computing, Xiangtan, China, 1–3 November 2012. Washington, DC: IEEE Computer Society.

36.

Bracciale

Bonola

Loreti

et al . CRAWDAD dataset roma/taxi (v. 2014-07-17), 2014, http://crawdad.org/roma/taxi/20140717