Abstract
With the development of Web-based services and related technologies, today’s supervisory control and data acquisition is regarded as an Internet of things service system for industrial infrastructures and ensuring open supervisory control and data acquisition systems in a secure and safe state at runtime becomes a critical and mandatory requirement. Existing host-based monitoring automata are vulnerable because “inside” malware may compromise and subvert the monitoring mechanism itself, and the virtual machine–based monitoring cannot provide observable running traces of the protected services because of the isolation between these services and the runtime monitor. In this article, we propose a non-intrusive solution to guarantee runtime state of open supervisory control and data acquisition systems. In this solution, the running traces of protected services are obtained in an “out-of-box” framework, which is built on abstract execution of network events on Internet of things service models and virtual machine semantic reconstruction of the protected services. In addition, a property checking procedure is employed to check the states of physical devices in advance to guarantee the runtime behavior in compliance with the security policies of open supervisory control and data acquisition systems. In such a way, the solution provides a fine-grained protection for open supervisory control and data acquisition systems and physical devices running in safety.
Keywords
Introduction
Supervisory control and data acquisition (SCADA) systems are deployed worldwide in many critical infrastructures ranging from power generation, over public transport to industrial manufacturing system for monitoring and controlling physical processes through a network of meters and sensors. In such systems, a majority of digital control actions are executed by sensors and/or actuators after the digital commands are converted to sensor signals by programmable logic controllers (PLCs) or remote terminal units (RTUs), while the real-time sensing data are sent to the supervisory system of the SCADA after sensor signals are converted to digital data. With the development of Web-based services and sensor technology, today’s SCADA can be viewed as an evolutionary paradigm, which is being more efficient and wide area, and regarded as an Internet of things (IoT) service system for industrial infrastructures. As the growing of physical threats, cyber-based attacks, and sophistication in critical infrastructure systems, ensuring open SCADA systems in a secure and safe state at runtime becomes a critical and mandatory requirement.
For instance, in a smart grid, one transmission line fault may bring about a chain reaction as well as a final blackout. To prevent power disturbances from cascading into large blackouts requires a real-time and complete visibility of the power system. These informational disconnects must be quickly retrieved and analyzed in such a communication foundation armed with time-synchronized phasor measurement units, phasor data concentrators, and so on.
The GridStat (Washington State University) project 1 took a publish/subscribe paradigm to build the communication foundation for smart grids to deliver coherent and real-time data, where a data consumer is able to describe his or her interest by subscription without being aware of who produces the data, and a data producer publishes his or her data without being aware of who subscribes the data. The customer’s subscription is defined by an event name, called topic, and the communication foundation delivers data with the “event name” to the subscribers. Although GridStat researchers gave the communication architecture for smart grids, they did not discuss security issues of such foundations as well as corresponding impacts on services running over the foundations.
Before addressing these security issues, some system models should be established because the semantics of these command events are always bounded to a specific system. Figure 1 shows a publish/subscribe-based open SCADA system including six services, which are acquisition and estimation (AE), event analysis (EA), human machine interface (HMI), control agent (CA), resource model (RM), and warning service. And there are two security requirements in such a system:
The control actions may be sequentially executed through the communication foundation. But adversaries may interweave these command events (messages) to endanger the system.
Adversaries may launch an invalid action to endanger some devices. It is mandatory to make runtime verification (RV) to ensure that the system devices will be safe after executing the actuation command and the event is really published by a valid launcher.

Control-related services in a SCADA system.
In traditional host-based studies on protecting running systems, runtime monitors were often embedded in the protected system as a function of it to check and modify its each action according to security policies, or were running according to the logs of the protected system, which may be subverted by malware inside the system (Figure 2(a)). To address this problem, recent solutions based on VM technologies advocate placing runtime monitors outside the VM (Figure 2(b)). But the existing virtual machine introspection (VMI) 2 and semantic reconstruction 3 methods could not directly provide a desirable security guaranteeing because of the complex protection goals.

Host-based VS VM-based runtime monitor approach: (a) host-based runtime monitor and (b) VM-based runtime monitor.
To implement non-intrusive runtime monitoring and enforcement, we propose a protection framework based on VM technologies consisting of four functionalities:
This article is based on work first presented at the IEEE International Conference on Web Services in 2015. 6
Preliminaries
IoT services
In an open SCADA system, there are multiple resources involved and multiple IoT services coordinated to complete the overall functions. We use Communication Diagram of UML2.5 7 to depict the specification of interactions among the IoT resources and services.
A communication diagram consists of a set of Lifelines (or Objects) and a set of Messages associating with different lifelines. It mainly focuses on the interaction between Lifelines. A message is a communication between a sender and a receiver. This communication involves two events: the event of sending the message and the event of receiving the message. And every message is labeled with a sequence expression which denotes the possible occurrence order of each message.
For example, we give a communication diagram in Figure 3 to demonstrate a remote control procedure and data acquisition procedure in a simplified SCADA example. In Figure 3, there are four threads collaborating to complete services interactions which are, respectively, labeled with prefixes

Communication diagram for a simplified SCADA example.
Definition 1
A communication diagram
→ is a total order on
Semantic reconstruction
To realize non-intrusive protection mechanisms, some technologies based on VM technologies are used to place runtime monitors outside the protected VM. 3 Such monitors adopt the VMI methodology to acquire low-level VM states externally.
Garfinkel and Rosenblum 2 describe VMI as a non-intrusive approach to inspect a VM from the outside and analyze the software running inside it, which can inspect the low-level VM states from the hypervisor level without perturbing the VM’s execution. This mechanism is supported and facilitated by a hypervisor or a virtual machine monitor (VMM), which is a thin layer of software that allows an operating system (OS) to run as a guest in a VM while maintaining control of the physical resources. A VMM provides the abilities of virtualization, isolation, and inspection. With such a VMI-based architecture, the VMM is able to gain a complete and untainted view of all system states in a VM.
When the (binary) low-level VM states are externally obtained, semantic reconstruction should be carried out to get the (user-readable) semantic-level view of the VM to bridge the semantic gap between machines and end users. Some object templates of guest OS are used to interpret these low-level VM states. For example, using guest memory data structures such as process control blocks (PCB) and functions as a template to link to the physical memory pages allocated to a VM by the VMM, each individual running process in the VM can be externally reconstructed.
Because the semantic reconstruction operation is externally carried out outside the target VM, any software running inside the VM cannot affect the external reconstruction of VM semantic view, where the VMM itself is assumed to be secure or armed with some enhanced hardware. Malware is able to compromise arbitrary entities and facilities inside the VM and subvert anti-malware systems residing in the guest host. However, it is not able to break out of the VM and tamper with the reconstructed view.8,9
Basic framework
The non-intrusive solution based on an “out of the box” approach has a framework in Figure 4, where there are one or more multiple IoT services deployed on a VM, multiple VMs holding different IoT service instances are connected by the communication foundation and collaborate to manage the physical industrial infrastructures. These IoT services publish and subscribe real-time data events, alarm events, and control command events by

The protection framework for open SCADA systems.
The framework works on the hypothesis that
The data on local point-to-point communication channels from local devices (e.g.
The sensor networks are implemented with protection mechanisms such as redundancy to provide trustworthiness of the reported sensor data, which means that the security in sensor level of such SCADA systems is not our focus.
However, for other communications in an open environment, for example,
The basic working process of the monitor is as follows:
All VMs run by the VMM have their own virtual Ethernet cards whose communications are all through a virtual network bridge (vbr). Our monitor intercepts Internet packets on the vbr and then uses the
According to the extracted events, the monitor symbolically executes IoT services based on the IoT service model. Then, an
The service trace estimated by the
According to the snapshot, we refine the service traces in the
If a milestone event is extracted from vbr, it triggers the VM semantic reconstruction immediately and a
If the
If the
Traces simulation and refinement
RV10,11 (also called runtime monitoring, runtime checking, runtime analysis, dynamic analysis, etc.) is a lightweight and successful technique to monitor behaviors of a running system and possibly reacts to observed behaviors satisfying or violating certain properties. When the target program is executed, the monitoring system observes the execution traces of the program and checks specification fulfillment or violation.
In our protection framework, because the target SCADA services are running in VMs and the monitor is running on the host, the actual traces are difficult, or even impossible, to be observed directly by monitors. So, we obtain the simulated system execution traces by abstractly executing network events on IoT service models and then refine the traces according to partial traces from VM semantic reconstruction. After events on a service trace are identified, the events should be verified in compliance with the global system behavior specification. We accomplish this work by the following procedures.
Event extract
To share events among different IoT services, events are named, for example,
The
With packet filtering tools, all IP packets through the VMM’s vbr can be intercepted and interpreted.
According to the name trees, only the root name in a tree is sought in the packets in order to keep high throughput. The soundness lies in that all event names include its root name. When no matching happens, the packets directly go through without further processing. Otherwise, go to step 3).
Identify events from the packets with the event schema binding to the event name.
Abstract execution
After one event is extracted in the
The basic procedure of
For one event
If the event
If
After one event is linked to the corresponding trace, previous unprocessed events stored in the list
Repeat the above steps, the service traces will be simulated as more and more events are processed.
Algorithm 1 shows the procedure. In this algorithm, primitive
Refine state
The traces output by the
In our solution, a dynamic and volatile data acquisition and analysis is adopted to reduce the time consumed in semantic reconstruction. After the VM’s memory is acquired, current states of all running services are externally reconstructed from the memory snapshot of one or more VMs. The reconstructed runtime state of each process is stored in a stack structure and corresponds to a partial trace of the corresponding service model. Then, the current simulated traces are refined with the reconstructed traces according to the following rules for
All subroutines or functions in calling in a given process stack space are reconstructed from a VM snapshot and stored in a stack structure
Getting the top element
Repeat the above two steps until all traces
The process is described in Algorithm 2. In the algorithm, method
Behavior checking
The
In the
In many cases, different services often interact with each other to accomplish some functional operations. If the event is interactive between two processes, we may find it in the stack space of another process, and a trace includes the event can be constructed. In such a case, it is obvious that if there is a sending event
From the procedure of
For an milestone event
If the event is found, the milestone event in
Repeat the above steps, until all traces are processed.
This process works on the basis of correctness and reliability of the running VMs, which can be ensured by monitoring and clearing exceptions on system call table, active process list, and network ports of running VMs using VMI methodology.
After the traces are processed, we verify whether they are in accordance with the global behavior specification (causality and conflict relations) of the system. Taking all events in the traces and specification into account may be time-consuming and unnecessary. Thus, we only consider behaviors of milestone events so as to reduce the time consumed.
To check causality and conflict relations among milestone events, current traces including milestone events are grouped first. Then, the traces in one group are connected by the order of event occurrence in one service and sending-receiving relation in different services, where each event is a vertex (multiple events with a same name are drawn as a vertex) in a directed trace graph and edges are their causality. When the graph is constructed, all non-milestone events are removed and edges are merged to get a pruned directed graph. After that, we check the edges in the graph in accordance with the defined causality relations, as well as the vertexes in accordance with the conflicts.
For an event satisfying system behavior specification,
Property checking and enforcement
In the
The current event is an actual and valid event of the SCADA system, which satisfies the system specification and should be executed by IoT services or field devices normally;
Or, the event is an actual event from the SCADA system but may endanger the system, for example,
Or, it may be a forged event from adversaries, which must be prevented when it is dangerous, for example,
So, the following work is how to ensure the execution in compliance with the system security specifications and avoid the dangerous operations on services and physical devices. This can be fulfilled by runtime enforcement monitoring.
As previously mentioned, in our protection framework, the actual traces are difficult to be observed directly by monitors such that we construct refined traces by abstract execution and VM semantic reconstruction. However, the refined traces are considered approximately, but not the same, as the actual ones. So, the existing verification mechanisms that consider only action sequences of current executions may not work well in such a situation.
In order to implement such verification on the basis of some proven work, we introduce the term
Given a set
Event
Event
If there is a trustable alternative event
Figure 5(b) shows how we think of a monitor interposing on and enforcing the validity of actions executed to secure target systems. The idea is based on MRAs in Figure 5(a),4,5 which is proposed by Dolzhenko, Ligatti, and Reddy to model the RV of applications. MRAs take both input actions and output actions into account and enforce security policies by transforming actions and results of these actions with an

Monitoring mechanism based on MRAs: (a) monitoring mechanism of security monitor MRA and (b) security monitor MRA based on trustable actions.
Like other security automata and edit automata, MRAs take all observed actions from target applications as input to the program monitor, where it does not consider the possible forged events attempting to operate on systems. The difference between our mechanism and the original MRAs is at the point of input actions (to monitor) and results returned (from monitor). We take actions in refined traces and a trustable actions set as input to improve the reliability of monitoring. Similarly, MRAs take results of input actions as output, while we consider the updated set of trustable actions as well as the results.
Definition of MRAs with constraint
Here, we present the basic notations and definitions of MRAs defined on constraint of trustable actions and detail how the automata enforce security policies on open SCADA systems using two examples.
A system is abstractly defined in terms of the system actions
Definition 2
Given a trustable action set
The operational semantics of the monitor is the same as that of MRAs whose definition of single-step semantics is as Figure 6.

Single-step operational semantics of MRAs.
Enforcement examples
We give two examples about how the MRAs with constraint on trustable actions to work and enforce policies.
Example 1
Consider a simple remote control service in a simplified SCADA system for turning on/off breakers. Assuming the service contains the following three methods:
Suppose we wish to require that
In the remote control service of SCADA system, if a control action
Associated with each event
The count of current event
The count is more than 3, but the duration between event
The policy
where
An MRA monitor
If
In this example, we can ensure the states of physical devices are enforced in safety by the automata
Here, we give another example.
Example 2
In a remote control service, any remote control action
Obviously, a remote control action may be emitted and executed if there is a trustable
Given MRA
The function works as expected:
From Example 2 we can see, if there is no trustable
Testing and evaluation
Figure 7 shows the overall process of the non-intrusive solution to open SCADA systems as well as the relations among different components of the protection framework.

The overall process of the non-intrusive solution.
Experimental analysis
The experiment environment is established as follows. The server, responsible for receiving events, abstractly executing the events, and proceeding the checking procedure, is running in a VM on host
Sends sequential events according to their occurrence in traces;
Sends out-of-order events not according to their occurrence in traces;
Sends sequential events, as well as randomly sending some milestone events using another thread to simulate the malware’s vicious operation.
Our monitor works in host
Netfilter is a framework provided by the Linux kernel that allows various networking-related operations to be implemented in the form of customized handlers. Netfilter represents a set of hooks inside the Linux kernel, allowing specific kernel modules to register callback functions with the kernel’s networking stack. The basic procedure of matching event topics in the modified Netfilter is as follows:
When an IP packet is captured, the protocol type, for example,
After the type of the packet is determined, the data length in the packet is computed out and the data are extracted out.
According to the predefined
If there is a valid
Figure 8 shows time spent on event extraction and checking procedure, which represents the packets processed per second with different packet lengths. We experiment with packets length about 100 bytes, 300 bytes, and 500 bytes and record the processing time for 19 times. From Figure 8 we can see that the number of packets process per second is to be less as the packet’s length goes up. In multiple experiments with sending events from the client continuously, there is no packet loss and all events not satisfying the behavior specification are prevented.

Time spent on event extraction and checking.
Semantic reconstruction
We use
We can access the stack address space based on the kernel memory structure
By traversing the active process list
With the
Reading and decoding data in corresponding areas (with
Volatility provides a plugin
The experimental environment of semantic reconstruction is listed in Table 1. In this experiment, we cyclically analyzed a stack space of a
Experimental environment for the semantic reconstruction.
VM: virtual machine; OS: operating system; VMM: virtual machine monitor.

Time consuming on semantic reconstructions from process’ stack address space.
After the functions are found, we align the functions’ names according to the order of their execution, as well as removing functions we never concern, to form the partial trace of the process. And all partial traces are used to refine simulated traces output by
Time complexity of algorithms
Because the efficiency of the algorithms depends on specific system and the size of inputs, we make an analysis on each algorithm to get its time complexity in the worst case.
Assuming that
For the
In Algorithm 2
So, the time complexity for the whole trace simulation and refinement procedure is to be
Enforcement analysis
To process a sequence of

Automaton for policy
From Figure 10 we can see that as long as
From the process of automata
Figure 11 depicts an automaton for policy

Automaton for policy
Related works
VMI has been a critical part of many recent virtualization-based approaches to security. By running vulnerable systems as VMs and moving security software from inside the VMs to outside, the out-of-VM solutions securely isolate the security software from the vulnerable system. First introduced by Garfinkel and Rosenblum, 2 introspection allowed security software to gain an understanding of the current state of the guest VM. Pfoh et al. 14 defined a formal model for describing VMI techniques, which helped examining and reasoning about possible VMI approaches. Srinivasan et al. 9 presented a process out-grafting approach for process-level execution monitoring (EM) by running a production VM and a security VM. Hizver and Chiueh 8 developed a real-time kernel data structure monitoring (RTKDSM) system based on Volatility to simplify and automate analysis of VM execution states. The RTKDSM system is designed as an extensible software framework extended to perform application-specific VM state analysis. All these works only considered information in process level or thread level, without considering operations in function level.
Runtime monitor observes a system’s behavior and detects whether it is consistent with a specification to provide the ultra confidence at runtime that the system satisfies some security properties. Eagle 15 is an external DSL (domain-specific language), and a linear time mu-calculus for monitoring, with past time as well as future time operators. Because of the inefficiencies in implementing Eagle, RuleR 16 that supported a rule-based specification language and suited well for processing data rich events was proposed. The Monitoring and Checking (MaC) toolset was developed as a systematic monitoring framework, 17 which targeted at soft real-time applications written in Java. Havelund 18 implemented an internal rule DSL in Scala and optimized Rete algorithm for RV. However, these approaches that embed a monitor into the protected system not only change the system with possibly negative results but also let malware have a chance to subvert the monitor mechanism.
Schneider 19 proposed an automata-theoretic characterization of property enforcement mechanism based on EM by embedding the enforcement code into the target system. Specifically, an EM-enforceable policy prescribes access event sequences recognized by a Büchi automaton. Based on security automata proposed by Schneider, Fong 20 defined shallow history automata as a specific type of memory-bounded monitor based on finite-state truncation automata which can enforce history-based access-control properties. Ligatti et al.21,22 introduced edit automata, which were transducers with infinitely states. They were inspired by enforcement mechanisms that can insert and suppress system actions as well as terminating a system with a policy being violated. Falcone et al. 23 focused on the trace properties which were identified by different automaton models according to the safety-progress hierarchy. The monitor to a trace universe was introduced by Ligatti and further improved by Chabot et al. 24 More recently, Basin et al. 25 distinguished system actions that are controllable by an enforcement mechanism of Schneider’s security automata and those actions that are only observable but cannot prevent their execution, which made the monitor is able to reason about timing constraints, such as “clock ticks.” Ligatti and Reddy 4 and Dolzhenko et al. 5 proposed a theory of runtime enforcement based on MRAs, which took output results (together with input actions) into account. This improvement overcomes the limitation in existing general models that cannot specify the policies defined or limited on output executions. All these works did not consider the computation model of the protected system.
Conclusion
We have presented a non-intrusive solution to enforce security policies on open SCADA systems based on VM technologies. In this solution, SCADA is broken into services which are exposed through VMs, and VMs are monitored by an “out-of-box” method. The solution is able to ensure the safety of services and local physical devices and builds a security gate at the bottom layer for the whole SCADA system. Experiments and analysis show that such a solution is feasible to provide a non-intrusive runtime guarantee to open SCADA systems.
In our approach, the monitor and the protected system are isolated. We propose an approach to extract out the system execution, which we called refined traces, by combining network events and runtime states based on the service model. Note that the abstract refined trace in our proposed approach is made of untrusted events from network (which may be spurious) and actual events reconstructed from VM memory (which are actually executed by the process), which may lead to a less accurate than that monitoring with actual executions. This is caused by the main difference between our approach and the traditional host-based approaches. This will introduce some advantages and disadvantages. The main advantages are as follows:
The monitor is non-intrusive;
No code changes need be done to the protected system;
Some intrusion detection system based on VMI technology can be implemented more effective and make a less impact on the protected services;
And a more flexible recovery mechanism based on VM technologies can be exploited when the protected system is compromised.
The main disadvantage of the solution may be that the refined trace may be less accurate than that of the actual one. This may lead the monitor to make a too strict runtime monitoring and enforcement. For example, an event
Another limitation of our solution may be that if some concrete policy is defined on some events that we cannot decide whether they had actually occurred, for example,
The future work of us is a detailed recovery scheme for recovering running services and related information in VMs when anomaly is detected.
Footnotes
Academic Editor: José Molina
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by National Natural Science Foundation of China (Nos 61372115 and 61132001), National 973 Program (No. 2012BAH94F02), National High-tech R&D Program of China (No. 2013AA102301), and China Postdoctoral Science Foundation funded project (No. 2016T90067).
