Abstract
Numerous security concerns exist in smart home systems in which Internet of Things devices are connected through a home network to enable control using a centralised gateway with a handset device from the Internet. Safeguarding personal information privacy is an increasing concern in smart living services. To guarantee the mobile security of smart living services, security managers use taint checking approaches with dynamic taint propagation analysis operations to examine how a software-defined networking app uses sensitive information and investigate suspicious security vulnerabilities of devices and the effects of the spread of taint propagation over the Internet by identifying taint paths. For solving the dynamic taint propagation analysis problem, most approaches focus on cloud computing applications (apps) with malware threat analysis that involves program vulnerability analyses, rather than on the risk posed by suspicious apps connected to the cloud computing server. Accordingly, this article proposes a taint propagation analysis model incorporating a weighted spanning tree analysis scheme for tracking data with taint marking using several taint checking tools with an open software-defined networking architecture for solving the dynamic taint propagation analysis problem. In the proposed model, Android programs perform dynamic taint propagation to analyse the spread of risks posed by suspicious apps connected to the centralised gateway in a smart home system. In probabilistic risk analysis, risk and defence capability are used for each taint path to assist a defender in recognising the attack results against network threats caused by malware infection and to estimate the losses of associated taint sources. A case of threat analysis of a typical cyber security attack is presented to demonstrate the proposed approach. A new approach was used for verifying the details of an attack sequence for malware infection by incorporating a finite state machine to appropriately represent the real dynamic taint propagation analysis situations at various configuration settings and safeguard deployment. The experimental results proved that the threat analysis model enables a defender to convert the spread of taint propagation to loss and estimate the risk of a specific threat using behavioural analysis associated with 60 families of real malware. Consequently, our scheme was significantly effective in predicting the risk and loss of tainted data propagation for security concerns in smart home systems when the number of taint paths associated with the propagation rules discovered through taint analysis was increased.
Keywords
Introduction
Cloud computing uses the Internet to deliver information services to open networks and involves the deployment of large-scale platforms; therefore, commercial data on the clouds might become targets of network attacks. For example, in September 2014, celebrities’ private photographs stored in the iCloud were disclosed. To prevent such a hack, a security analysis of the information exchange within intersuspicious modules (intramodules) and program of an external network (interapp), such as illegal access memory, buffer overflow attacks of the intramodule and malware downloaded by the botnets of the interapp, are required. In practice, new malware attacks can bypass firewall-based detection by bypassing stack protection and using Hypertext Transfer Protocol logging, kernel hacks and library hack techniques and to the smart living appliances (apps). Effective security defence mechanisms involving threat analysis techniques are essential for detecting intruder attacks in open networks. Taint checking (TC) is a blacklisting approach because it asserts that certain values are dangerous for taint analysis. Generally, TC involves evaluating specific security risks from attacks such as memory corruption and a buffer overflow attack. 1 Unlike the traditional threat analysis technique, in TC, the focus is more on detecting exploits, and data flow analysis is used to determine whether the value is derived from a user input. In taint analysis, modules connected to data originating from untrusted network channels are marked as contaminated, and a series of arithmetic and logic operations to track data with taint marking.
To guarantee the security of cloud computing systems, network administrators use TC approaches with an open software-defined networking (SDN) architecture to examine how a cloud app uses sensitive information and investigate suspicious behaviours of users before deploying an app. An SDN architecture may facilitate network-related security applications because of the controller’s central view of the network, and its capacity to track data with symbolic taint marking and identify the spread of intents. In performing malware threat analysis against unspecified malware attacks, network administrators can use a TC approach for tracking information flows between attack sources (malware) and detect vulnerabilities of targeted network apps.
Many TC approaches have been developed to enhance security by preventing malicious users from executing commands on a host computer,2–5 which require computing emulations with a set of distinct mission hosts in a virtual security experimental environment. As described previously, most existing TC schemes focus on examining the security hole for mobile device apps but do not fully address the severe issues regarding the risk posed by suspicious apps connected to security holes in the cloud computing server. The comparative analyses of TC schemes are provided in Table 1.
Comparison of TC approaches for cloud app security.
The literature contains various proposals for solving the TC problem based on sandbox analysis techniques (SATs). Although SATs have many advantages, they have several limitations. For example, multiple taint sources cannot be traced online. In addition, taint sources can be easily infected through app connections. Furthermore, an SAT does not adequately answer critical questions regarding (1) the security holes of various risk levels in the development of network apps and (2) the attack paths selected for introducing safeguards. By contrast, the TC scheme emphasises specific security risks primarily associated with websites that are attacked using techniques such as SQL injection and buffer overflow attack approaches. 1 Thus, TC techniques effectively specify possible targeted information flows between the attack sources (malware) and the security holes of network apps. Moreover, the technical content of an SAT differs from that of the traditional threat risk approach; that is, an SAT focusses on the detection of the dynamic behaviour between the suspicious host and the test app’s vulnerability, whereas the TC technique emphasises examining the scope of taint propagation within an intramodule or interapp. In addition, the connection to taint sources by judging the intents of a contaminated information flow is derived from either the user input or external taint sources for determining the exact signature of exploits (SOEs).
Accordingly, this study proposes a taint propagation analysis model that incorporates a weighted spanning tree analysis for tracking data with taint marking and identifying the spread of intents by considering the state transitions of a program in an SDN-based smart home security management system. Both dynamic taint analysis (DTA) and risk analysis are incorporated in the scheme and used for possible system exploits to solve the dynamic taint propagation analysis (DTPA) problem. In the following context, each taint analysis tool, Valgrind 6 and ComDroid, 7 associated with two malware behavioural analysis tools, Androguard 8 and Droidbox, 9 is used to assist defenders in examining data correct. This analysis helps defenders to analyse the SOEs accounting for the behavioural profiles for assessing the risk of a taint source to the commercial data in the cloud computing server. In developing the proposed model, we considered two crucial aspects: (1) the investigation of the taint paths, propagation rules (PRs) and SOEs for a taint source by tracking data with symbolic taint marking and (2) the evaluation of risk in accordance with the taint paths responding to specific attacks.
The remainder of this article is organised as follows. Section ‘Related work’ reviews previous studies in this field. Section ‘Dynamic taint propagation model for threat risk analysis of mobile malware’ introduces the proposed taint propagation analysis model based on the behavioural profile of suspicious apps. An analysis of the results is presented in section ‘Cyber security app’. The taint risk in the DTPA process is discussed in section ‘Discussion’. Finally, section ‘Conclusion’ provides the conclusions of this study.
Related work
This section describes the use of two important parts to solve the DTPA problem in open networks, namely, TC approaches for assessing threats and security using SDN architecture.
TC approaches for assessing malware threats
Typically, the following two basic approaches are used for TC: (1) static taint analysis (STA): examine the program text and analyse overmultiple paths of a program. Typically, STA is performed on a control flow graph in which statements are nodes, and an edge exists between nodes if there is a possible transfer of control. (2) DTA: investigate the instructions executed when the program runs. In particular, DTA inspects a single path once of a single run and determines exact taint values. 10 By contrast, the STA might generate extra taint paths from control flow graph and result in false positives.
Normally, three crucial steps are followed in DTA: (1) taint source analyses, (2) propagation analysis with taint marking and (3) testing of the code with the assertions. Since the 2000s, TC approaches have been used for exploring the vulnerabilities in information systems and identifying possible impacts from malware attacks by altering the flow of program execution. In comparison to STA, DTA tracks intents with taint marking to the information flow of the intramodule or interapp; consequently, it is relatively accurate for monitoring the running app and is suitable for tracking the flows of private sensitive data. 11
Many TC approaches incorporate taint propagation algorithms for detecting malware. TC techniques for an analysis algorithm, including Argos, 2 Panorama, 3 TaintCheck, 4 ComDroid 7 and TaintDroid 12 approaches, have been used to increase the analysis precision by providing insights into how activities and outputs are formulated through the transmission of intents from tainted sources. These TC schemes are summarised in Table 2.
Security analysis approaches for TC.
TC: taint checking; API: application programming interface.
Security management using an SDN architecture
SDN began after Sun Microsystems released Java in 1995. SDN enables network administrators to manage network services through the abstraction of high-level functionality. SDN is a dynamic, manageable, cost-effective and adaptable architecture suitable for the high-bandwidth, dynamic nature of today’s applications. SDN architectures decouple network control and data forwarding functions, enabling network control to be directly programmable and the underlying infrastructure to be abstracted from applications and network services. 15
In smart living appliances, Internet of Things (IoT) devices are connected through a home network to enable control using a centralised network controller associated with remote network devices from the Internet. When integrated by wireless sensing and information communication technologies for IoT devices, home systems and appliances are advantageous because they can communicate in an efficient manner that provides living convenience, health promotion and safety benefits. Because personal information in smart living services cannot be disclosed without legal permission, using private information, including personal IDs, locations, biometrics and secret data, requires ensuring its security.
Theoretically, an SDN-based network architecture provides a user with secure remote monitoring and managed smart living appliances for the home environment. Examples include remote control lighting systems, fire warning, the remote monitoring of infant activity, pet status remote monitoring and the transmitting of personal health information to a cloud platform for analysis. Although mobile devices and IoT devices with an SDN architecture for smart living appliances have improved the convenience of our daily lives, they also pose a threat to personal privacy and information security for users. Moreover, networking manufacturers generally collect personal information and private habits without warning. Consequently, mobile devices or IoT devices are controlled by remote malicious users and cause the improper disclosure or sharing of information. In practice, devices in smart homes provide network-layer security and privacy control mechanisms to ensure the privacy and information security of users by monitoring network activity and detecting suspicious network behaviour. Accordingly, a dynamic security mechanism based on network-layer security and privacy control mechanisms is adopted in the SDN controller (Figure 1).

SDN-based network architecture.
In addition to existing security mechanisms, novel approaches using SDN as a network-wide control mechanism for resolving high-risk security concerns, including distributed denial of service detection and mitigation,16,17 worm propagation 18 and botnet protection, 19 have been suggested by several researchers.
Dynamic taint propagation model for threat risk analysis of mobile malware
The proposed model was designed to examine information flows associated with suspicious taint sources, use a taint graph to track the suspicious connection with the taint source and identify the spread of tainted data.
Basic concept
Suppose there are taint sources and a taint sink of a targeted network (i.e. an SDN-based smart home system) in which hackers attempt to compromise network security. For solving the DTPA problem, our approach focusses on developing apps using malware threat analysis that involves program vulnerability analyses for identifying both security holes and malicious behaviours associated with malware attacks.
The first task is initialising and marking all modules for cloud services as a clean state in a targeted network. The basic concept involves using the weighted spanning tree algorithm (WSTA) and determining the cut sets for DTPA on the basis of taint marking; that is, the information type is assigned to tainted data. A taint graph was constructed by starting from a taint source to any connected module through intent transmission. A depth-first search (DFS) was then performed on a taint graph in accordance with taint marking to gradually connect the edges between the two adjacent modules, which pass tainted data (i.e. intents) through the network. In such a situation, the module is marked as contaminated, and the module is marked as clean if no tainted data transmit; that is, the edges between the two adjacent modules are disconnected (dotted lines in Figure 2).

Use of taint analysis to determine whether the module is contaminated via intents.
At the end, after visiting all modules, the traversal path of taint propagation was constructed using the DFS spanning tree algorithm. The taint graph represents how tainted data or messages propagate during program execution and is depicted as a directed graph G = (V, E), wherein a set of vertices V = {u, v1, v2,…, vm}. u represents a taint source; v1, v2,…, vs represent a set of modules in an app or a cloud service; E = {e1, e2,…, en} represent a set of edges between two neighbouring modules of G (Figure 2).
As shown in Figure 2, existing TC approaches do not consider the state transition of an app module in a system, including preconditions and postconditions. In other words, although the receiver module accepted intents, defenders used security protection, such as bug fixes in the operating system and a firewall in the network system, to repel the taint propagation results. However, if defenders do not patch the security vulnerabilities, tainted data would spread. Thus, this study incorporated a finite state machine (FSM) to appropriately represent DTPA using a network flow analysis technique (Figure 3). Figure 3 shows that the FSM may generate uncontaminated states for a module in a taint graph in defence situations and contaminated states for the same module shown in Figure 4. Using an FSM-based taint propagation of information analysis produced various results that represent the actual situations in which various configuration settings, security fixes and safeguard deployments were set up in the DTPA processes.

Incorporation of an FSM into taint analysis.

Taint analysis by considering the state transition of a module.
Figures 3 and 4 show that considering the state transition of a module for taint analysis is more suitable than those in existing TC approaches. As shown in Figure 4, if tainted data are received from an untrusted source for a module that accepts tainted data and output intents to the connected modules, it would be marked as a contaminated state. The module connects the edges between these modules. By contrast, a module is marked as uncontaminated if it disconnects the edges between two modules.
This study proposes an SDN-based smart home system with an OpenFlow protocol for network configuration, traffic management and security monitoring (Figure 5), in which an OpenFlow switch (network device) and a controller communicate via the OpenFlow protocol. In the proposed SDN system, the SDN architecture enables the network controller to track data with taint marking and determine the path of network packets across a set of networks of switches with two basic components: (1) a control tool (controller) for determining the network packet flow and (2) the OpenFlow routing table (Flow Table) for selecting the network packet transmission path. Thus, OpenFlow can enable the remote administration of packet forwarding tables for layer 3 switch by adding, modifying and removing packet matching rules and actions. 20

SDN-based smart home system (OpenFlow enabled).
As Figure 5 illustrates, two management tools for OpenFlow are incorporated into the smart home system to detect suspicious behaviour: (1) Open vSwitch, which is used to build a network device for data forwarding and packet routing management in the SDN infrastructure layer, and (2) Floodlight, which is an open-source tool used as a supervisor to manage the SDN control layer and application layer for access control and traffic and security monitoring.
Penetration testing with malware through a trace of intents is an effective approach that facilitates evaluating the security of SDN infrastructures to safely identify vulnerabilities of systems. In practice, sending intents to the improper app modules can lead to user privacy leakage and permissions transferred among apps. Therefore, a taint marking–based framework for tracking intents is proposed and described in the following.
Constructing a taint graph involves large amounts of intent transmission from taint sources to taint sinks and often requires numerous logic operations and cost analyses. Therefore, the WSTA associated with an open SDN controller was used to assist defenders in tracking data with taint marking for DTPA (Figure 6). In addition, the mechanism was developed to analyse the spread of taint propagation in the information flow between mobile devices and cloud computing servers and to help defenders to define the SOEs for taint sources.

Framework for DTPA with an open SDN controller.
Figure 7 shows that the malware behavioural analysis incorporated in the framework is suitable for taint source detection and propagation analysis using application programming interface (API) hooking and taint marking to identify the taint paths of the propagation spread. Three subprocesses included in the DTPA process are taint source analysis, taint propagation analysis and determination of SOEs.

Execution process of the dynamic taint propagation analysis shown in Figure 6.
TC process of malware threat analysis
Assume that partial system vulnerabilities of a cloud service are known and a set of behavioural profiles associated with each malware has been identified. The following DTPA algorithm includes a new resolution process of taint propagation to track the taint paths of propagation according to the behavioural profiles of a specific threat, thereby estimating the SOEs of a successful attack by malware.
Step 1: taint source analyses
Step 1.1: malware behavioural analyses
To perform behavioural analysis of malware, a common approach is to use virtual machine analysis techniques. Generally, a sandbox provides capabilities such as stopping at a control point to prevent potentially dangerous programs from running, as well as monitoring and recording activity in a closed environment when the malware is running. Malware signatures were derived by combining both the security flaws of static code analysis and dynamic behavioural patterns of behavioural analysis with the support of a virtual machine emulator to facilitate the detection of unidentified mobile malware and variants.
Dynamic behavioural analysis was conducted to obtain the major runtime behaviour regarding access to the network, file, registry and disc of each app, which was compared with malicious and normal behavioural profiles. Static analysis was conducted to examine the specific source codes and binary strings of malware, and comparison results were recorded in a log file to discriminate the differences in the signatures of malware variants and new viruses. In the malware behavioural analyses, two free shareware products, Androguard 8 and DroidBox, 9 developed by the Honeynet project, were used to generate malware signatures as the source data set; these signatures were used to taint analyses later. An experiment involving the following four steps was conducted:
Download the suspected apps (.apk file format) from a mobile phone;
Perform in-depth behavioural analysis on the DroidBox platform;
Perform code analysis on the Androguard platform using reverse engineering;
Output the synthesis report.
Step 1.2: malicious activity analyses of taint sources
This step involves the violation detection for the permission rules as the taint source analyses. Four protection levels exist for the access to the system API with permissions on Android platforms: Normal, Dangerous, Signature and SignatureOrSystem. Apps request the appropriate permissions in their manifests to obtain a privileged access to system commands or protected API calls. To analyse the behaviours of the taint sources, the commands of program executables and API calls outputs connected modules to be listed as the basis of malicious instruction sets (MISs).
To identify the attack profiles of MISs, intent-based analyses are adopted to inspect the transmission of intents from taint sources to the receiving module. Intents can be sent between three of four components: Activities, Services, Receivers and Broadcast. Intents can be used to start Activities; start, stop and bind Services; and broadcast information to Broadcast Receivers. 7 Generally, intent operations include three types of instruction sets: (1) source: MISs for taint source analysis exchange parameters between the taint sources and sinks, such as startActivity, startService, stopService, bindService, read and gets; (2) sink: the instruction sets for a taint sink include receive, onReceive and write; and (3) propagation: commands for taint propagation comprise sendBroadcast and sendOrderedBroadcast. Once the MISs are located, intent-based analysis can be used to construct the malicious behavioural profiles for taint sources.
Step 2: examining security holes in cloud apps
This step identifies security holes using STA that involves program vulnerability analyses for developing apps. Theoretically, more security holes in apps cause malicious behaviours.
Available TC schemes for examining security holes using analysis mechanisms, such as Panorama, 3 Valgrind, 6 TEMU 21 and TAJ, 22 are capable of inspecting security holes in apps. Valgrind, 6 developed by Seward, was proposed for automatic multimemory debugging and detailed memory leak detection. Valgrind was originally designed to be a free memory debugging tool for a Linux operating environment, and it has evolved into a generic framework for creating dynamic analysis tools such as checkers and profilers for inspecting programs being developed. In particular, Valgrind can recompile a binary code to run on the host and target of the same architecture using a simulated CPU technique. Therefore, it is suitable for supporting the inspection of data at possible exploits through the examination of program control transfers.
Valgrind contains multiple tools. The most frequently used tool is Memcheck. Valgrind is a virtual machine using just-in-time compilation techniques, including dynamic recompilation. In STA, six types of test cases are examined using Memcheck: (1) invalid write, (2) conditioned jump on an uninitialised value, (3) memory boundary check for invalid read and write, (4) source and destination overlap, (5) inconsistencies between dynamic memory allocation and release and (6) memory leak. The evaluation of risk based on the security holes responding to a specific threat is described in the following.
Step 3: DTPA
This step helps defenders to examine whether a testing app is vulnerable to potential cyber attacks by comparing normal and malicious behavioural profiles (i.e. permissions, PRs and SOEs) between a legitimate app and a malware website. In DTPA, PRs and SOEs for taint sources are characterised using three subprocesses: (1) correlation analysis (CA) for MISs, (2) taint propagation analysis using the WSTA and (3) the determination of SOEs.
Step 3.1: correlation analysis for MISs
To identify the spread of taint propagation, PRs must be defined by tracking the information flow. Theoretically, PRs can be derived using CA for app components and relevant tainted data via the execution of MISs to analyse the stain propagation behaviour. Furthermore, many CA approaches incorporate association rules, frequent episode rules and data mining approaches to facilitate precise prediction for generating PRs.
PRs for tainted data can be analysed by comparing the distinct calling sequences of the malicious API calls in which the calling sequences are establishing using a pair of [Sourcej, Sinkk Sinkk+1, …] between the malware website and the proper app. In practice, network analysis tools, such as Netflow and Wireshark, can be used to assist a defender in collecting large amounts of logs to identify the specific calling sequences of MISs. Finally, the relevant calling sequences of MISs based on program execution for each taint source j are summed to generate the PR, which has the form PRi (Apk name, Taint source, Module and Calling sequence of MISs), as shown in Table 3.
Examples of the taint propagation rule.
As shown in Table 3, the calling sequence of MISs represents the propagation chain of a taint source i (i = 1,…, m) and comprises a set of methods used in evaluating the attack signature for each module. The module name associated with a specific MIS is measured using a set of calling sequences, where a longer calling sequence might cause a higher risk and severe loss.
When deciding the calling sequence of MISs (Table 3), determining the exact pattern of the taint propagation actions is difficult. On the basis of the concept of an intrusion detection system, the occurrence of taint propagation is solved using frequent episode rules 23 by accumulating and associating the security logs as follows. Generally, episodes are partially ordered sets of events. The frequent episode rule is used to determine the specific event sequences for appropriately determining the MISs of a taint source, as shown in Figure 8.

Determine the exact pattern of the taint propagation using sliding windows.
Given an event sequence s = (s; Ts; Te) and a window width win, let the time window of an episode be given by w = (w; Ts; Te). The support degree of an episode is defined as the fraction of windows where the episode occurs. Theoretically, given s and win, the support degree of an episode (α) (i.e. MISs) for a single taint path j in s is
Once sup(α) is obtained, it can be used to predict the probability of an attack occurrence pij for a single taint path j from taint source i. In other words, sup(α) reveals the connections between attack events with MISs in the given security event sequence. Eventually, the normal behavioural profiles for apps are compared with the calling sequence of MISs for a taint source to determine whether an app is vulnerable to a specific taint source.
Step 3.2: taint propagation analysis using the WSTA
Suppose a spanning tree T models the network flow for DTPA using graph theory. Here, T is an undirected graph G used to construct a tree, including all vertices and edges without any loop.
To evaluate the taint risk of DTPA, the probability of attack success for taint path
According to equation (2), the taint risk matrix was used to describe the probability of attack success for taint source i (i = 1,…, m), which was caused by a set of security holes that generated taint paths j (j = 1,…, n) by propagating taint and has the form
Initially, the WSTA ensures the connection between any two nodes of a network in which there is only a single path without a loop. The modelling of taint propagation based on the connectivity of the information flow is then used to construct a spanning tree algorithm. In particular, a taint graph is constructed by aggregating spanning subtrees, which involve edges with weights. Theoretically, the graph must assign a greater weight to an edge when a high degree of the spread of taint propagation causes a series threat.
A defender assigns a weighted value to each edge at time t by considering the normalised ratio of the loss of data leaks for a single taint path to those for all taint paths for a specific threat in a taint graph G, as follows
In practice, the weight of a traversal path for a taint path must be updated by the continual spread of taint propagation along its path at time t+1, and this may increase the loss
By contrast, the defence investment resources on each path against tainted data pass through that helps a defender to identify the total cost to secure the possible taint paths. If the project time or defence resource is limited, the minimum spanning tree (MST) can be applied to determine the priority of guarding the taint path, thereby preventing maximum leaks to users using minimum resources. After defining the PRs for a taint source, the WSTA is used to identify the spread of taint propagation associated with the development of the TC tool.
Generally, two algorithms for the WSTA can be used: Dijkstra’s algorithm and Prim’s algorithm. The MST algorithm proposed by Dijkstra is used for identifying the shortest paths between nodes in a graph. It selects the unvisited vertex with the lowest distance, calculates the weights across it to each unvisited neighbour vertex and updates the neighbour’s weighting value if it is smaller. The vertex is marked as visited when the neighbour vertex is updated. By contrast, Prim’s algorithm is a greedy algorithm that identifies an MST for a weighted graph. It identifies a subset of edges that forms a tree including every vertex, and the total weight of all the edges in the tree is minimised. The algorithm operates by building this tree one vertex at a time from an arbitrary starting vertex and at each step adding a low-cost possible connection from the tree to another vertex. In WSTA analysis, taint risk assessment is performed in the following.
DTPA was used to examine the various vulnerabilities in information systems, identify risks regarding tainted data propagation and resolve high-risk security concerns. The proposed model was designed to describe the SOE and estimate the risk of each taint path for selecting appropriate safeguards to defend against cyber attacks. In DTPA, the taint risk (ri) is characterised using two quantities referring to the national vulnerability database (NVD) of common vulnerability scoring system (CVSS): (1) the probability of attack success for s single taint path j (
Step 3.3: determination of SOEs
The recognised security vulnerabilities of network apps have been investigated, examined and reported. For example, Mitre Corporation maintains a list of disclosed vulnerabilities in a system called common vulnerabilities and exposures (CVEs), in which vulnerabilities are scored using a CVSS.
The loss of data leaks may correlate closely to the spread of taint propagation. The SOEs indicate a set of possible attack profiles 24 caused by security holes for a taint source. In this study, the form for SOEs was defined as (IDj, CVE, taint source url, affected module name and PRs) and was generated by collecting a set of PR, PRk, PRk+1, PRk+2, … for each taint source in accordance with the behaviour analysis of malware. Once the PRs and SOEs were identified, they were used to distinguish whether the app was malicious or benign by comparing the differences in the signatures of malware and legitimate apps. A detailed algorithm for tracking data with taint marking for dynamic propagation analysis is described by Program Description Language (PDL) and shown in Figure 9.

Algorithm of DTPA for cloud apps.
Cyber security app
In this section, the applicability of the proposed TC analysis model is demonstrated by considering an example of cloud security using an open SDN-based smart home system (Figure 10). In constructing the SDN infrastructure layer, our experiment incorporated both Raspberry Pi and a wireless access point (AP) with a set of management software tools including OpenWrt, Open vSwitch and Floodlight to construct an SDN-based smart home system, in which Floodlight serves as a supervisor to manage the SDN controller layer and application layer for detecting network anomalies in security monitoring. As Figure 10 illustrates, Open vSwitch assists OpenFlow switches in communicating with each other using the OpenFlow protocol via a secure channel to connect the SDN controller. In the proposed SDN project, the SDN controller uses an Ryu SDN framework in an Ubuntu operation system to determine the network configuration and network packet flow of network procedures, such as access control, network traffic and security monitoring.

SDN-based smart home security management.
The smart home gateway is responsible for connection to external communications on the Internet. SDN applications for smart home scenarios with home security monitoring, lighting control and temperature and humidity monitoring were synchronised with three subnetworks: 192.168.1.0/24, 192.168.2.0/24 and 192.168.3.0/24. Each of these subnetworks communicates via a Wi-Fi base station (wireless AP) with OpenFlow switches (Table 4).
Experiment environment.
SDN: software-defined networking.
OpenFlow switch adopts an Ethernet-enabled USB interface wireless AP.
In the proposed SDN system, security monitoring entails periodically collecting network statistics from OpenFlow switches and then applying classification algorithms to those statistics to detect any network anomalies. If an anomaly is detected, the application instructs the controller on how to reprogram the data plane to mitigate it.
In the experiment, the DTPA is constructed using the following three-step procedure involving the taint source analysis, malware behavioural analysis and DTPA on an android platform. For example, malware Zitmo (Zeus-in-the-mobile) Trojan attacks Android and Blackberry smartphones. As shown in Figure 11 and Table 5, Zitmo was found to have a connection to the command-and-control (C&C) server used in a botnet, IP 224.0.0.251 for performing multicast Domain Name System (DNS) queries by examining the traffic information of OpenFlow switch with taint sources.

An example of malicious activity analyses of a taint source with OpenFlow switches.
Malicious activities for malware Zitmo.
API: application programming interface.
In the experiment, the DTPA is constructed using the following three-step procedure involving the taint source analysis, malware behavioural analysis and DTPA on an android platform.
Step 1: Taint source analyses
Step 1.1: malware behavioural analyses
A total of 60 malware families identified on mobile devices using active infections were obtained from blacklists published on the Dr Web and Contagio Blogger websites for an experiment conducted from February 2013 to February 2014. Initially, the suspected app (.apk) was downloaded from a smart device to both the DroidBox 9 and the Androguard analysis platform. The experimental results of the behavioural analysis and code analysis of the mobile viruses are shown in Table 6. Defenders can verify the details of an attack sequence to obtain the possible attack profile.
Relationships between viruses and behaviours.
API: application programming interface.
Step 1.2: malicious activity analyses of taint sources
To analyse the intent-based signature for taint source attacks and examine whether an app is vulnerable to special cyber attacks according to four protection levels, install the APP package file Android application package (APK) into the testing folder; execute TC tools, ComDroid, API_Monitor and Wireshark; examine illegal memory access; collect message passing to trace the taint source; inspect API call outputs; investigate taint paths for service, activity and receiver and broadcast communication among modules; and analyse binary interactions with the environment. An example of malicious activity analyses of a taint source is shown in Table 7.
Malicious activities for a taint source.
In practice, the N-gram modelling scheme is used to obtain the more occurred API calls for MISs. We randomly examined 30 suspicious apps to examine the capability of vulnerability detection for ComDroid and API_Monitor in proposed SDN system, thereby determining the common weaknesses of the apps. Consequently, ComDroid detected a total of 179 exploits, 49 warnings for exposed components and 353 warnings for exposed intents across 30 suspicious apps. In sum, the most occurred API calls associated with the relevant MISs for the 30 suspicious apps were obtained using analyse intent source, sink and broadcast operations for Input/output (I/O), service, activity and communication, as shown in Table 8.
API calls for MISs with 30 taint sources.
API: application programming interface; MIS: malicious instruction set.
After defenders identify the calling API for MISs for the taint source, the attack profiles can be accumulated to perform TC analysis with the security holes.
Step 2: examining security holes in apps
In this step, two Valgrind-based analysis tools, Valgrind-3.9.0 for X86/Linux and X86/Android (4.0), were used to perform taint source analyses. Initially, Dedexer was used to disassemble DEX files; the Valgrind tool was then employed to record potential vulnerabilities for modules and intents, and Androguard was then used to confirm the results of Valgrind and to further track information flows between the attack sources (i.e. malware) and the possible tainted data of targeted network apps to reveal the possible taint source, as illustrated in Figure 12.

Experiment environment of taint analyses.
Essentially, six types of test cases for control variables of the module were examined using Valgrind: (1) invalid write, (2) conditioned jump on an uninitialised value, (3) memory boundary check for invalid read and write, (4) source and destination overlap, (5) inconsistencies between dynamic memory allocation and release and (6) memory leak. These test cases were examined by considering the taint policies of an organisation (Figures 13 and 14). For conducting a security check of network apps, Androguard can perform STA on the major activities (namely, MainActivity) for APPs to reveal the potential taint source.

Invalid write checking.

Conditioned jump on uninitialised value checking.
Typically, two auxiliary tools in Androguard, dex2jar-0.0.9.15 and jd-gui, are used to disassemble the binary files and form the source code in which a folder contains classes.dex, a .APK file, a package and AndroidManifest.xml. The execution of tool./dex2jar.sh generates the jar file for classes.dex. In the disassemble process, the tool jd-gui can unzip the jar file and obtain the source code. After analysing the source code, the hacker was observed to have set a thread in clService in the downloaded package for automatically reporting the associated information to a zombie using the HttpPost method (Figures 15 and 16).

Security check of the main activity for network apps.

Taint check of the source code.
Step 3: DTPA
In DTPA, the SOE of a taint source is characterised using three subprocesses: (1) correlation analysis for MISs, (2) taint propagation analysis using an STA and (3) the determination of SOEs.
Step 3.1: correlation analysis for MISs
To identify the MISs of a taint source, the TC analysis tool ComDroid 7 was employed to analyse the statistical information of disassembled output from Dedexer and to log potential components and intent vulnerabilities. By using (3), detection probability of the MISs that appeared most for malicious behaviours were extracted from the collected file details.log regarding the intent creation and transmission operations in component modules (Table 9).
MISs of taint sources.
MIS: malicious instruction set.
Step 3.2: taint propagation analysis using the WSTA
In this step, ComDroid was used with the WSTA to monitor tainted data for the following purposes: (1) inspect intent transmission from taint sources to the receiving module, (2) examine the intents of intermodule transmission and (3) determine whether the connected app and database are contaminated. In practice, ComDroid can track the intents of communications among apps by analysing the file details.log and listing information on activity, service and broadcast sink. Figure 17 illustrates the analysis of suspicious links between malware and an app after the module received intents. A defender can click on allActions to examine the activity in detail (Figure 18).

Application of ComDroid to inspect exposures between malware and an app.

Actions performed by the receiving intents in an app.
A defender can apply the analysis results of ComDroid to conclude the rule of taint propagation on the basis of the frequent episode patterns of security event logs using equation (1) for a specific taint source. The PRs derived from the experiment are shown in Table 10.
Examples of the taint propagation rule.
MIS: malicious instruction set.
After the formulation of PRs, DTPA tools can automatically correlate these PRs to perform taint propagation analysis along with the possible taint paths. For example, for the APK:Geinimi-Banking Trojan invasion of cloud services (information available at www.ipay.com.cn), we downloaded samples from http://contagiominidump.blogspot.tw/2011/10/geinimi-banking-trojan-wwwipaycomcn.html to analyse the communication between Android apps and cloud services and then constructed a taint graph of information flow using the detail.log files derived from ComDroid associated with the WSTA (Figure 19). In setting the weights to an edge in a taint graph, a defender must consider the relative severity of data leaks. For example, a defender assigns a higher weight than that of mobile devices to the connection of a cloud server. The Kruskal MST algorithm was used to remove the taint-free source of information flow in communication and generate a new taint graph (Figure 20) to decide the priority of securing the taint paths.

Spanning tree approach for analysing the taint path.

MST approach for analysing the taint path.
Step 3.3: determination of SOEs
Vulnerabilities issued by United States Computer Emergency Readiness Team (US-CERT) for an Android platform are listed in Table 11. The proposed model was designed to describe the SOE and estimate the risk of each taint path for selecting appropriate safeguards to defend against cyber attacks. Once the system vulnerabilities, taint paths and PRs are identified, the defender can form the SOEs for tracking taint paths between the malicious website and the detected vulnerabilities of targeted network apps. The form for SOEs is defined as (IDj, TS, PRk, PRk+1, PRk+2,…), generated by collecting a set of PRs in accordance with the taint sources and the relevant packages. Examples of the SOEs are shown in Table 12. Furthermore, the zombie infections for a taint source are located in Honeynet Map for managerial purposes.
Vulnerabilities for an Android platform.
CVE: common vulnerability and exposure; CVSS: common vulnerability scoring system.
Examples of SOEs.
SOE: signature of exploit.
Discussion
The taint risk in the DTPA process is discussed in the following. Because of limited contexts available, the experiment data sample for smart home system is reported in Table 13. In Table 13, 10 taint sources were tested using websites blacklisted by Google, including the captured malware samples from Contagio Blogger. Notably, taint sources were identified as risks from attacks caused by security holes in the apps and smart living services; Figure 21 shows the links among the probability of attack success for a taint source, the loss of possible spreads of the taint path and the taint risk. The results presented in Figure 22 show that the taint risk increases as both the degree of taint propagation and the loss of taint spread are increased. In addition, losses of tainted data propagation increase with an increasing degree of taint propagation but decrease with the defence capability. The high loss originated from the unsettled vulnerabilities that caused the spread of taint propagation represented by the number of PRs, which is associated with a low defence capability against taint propagation
Risks of a taint source with various spreads of taint propagation and impact.
PR: propagation rule.
The degree of taint propagation is calculated by converting individual value of PRs to normalised score and the defence capability is scored on a typical 5-point Likert scale, low(L) [0, 0.2), medium low (ML) [0.2, 0.4), medium (M) [0.4, 0.6), medium high (MH) [0.6, 0.8) and high(H) [0.8, 1.0].

Risk represented by loss of data leak and probability of attack success.

Risks of a taint source with various spreads of taint propagation.
Conclusion
This article proposes a threat analysis model for solving the mobile security problem by performing in-depth behavioural analysis on the ComDroid platform to indicate the attack profile for malware infection, considering both code analysis and behaviour analysis for program vulnerabilities in a smart home system. In the proposed model, an improved DTPA scheme was obtained by revising the approach in Newsome and Song 4 and used as an enhancement method for the DTA of cloud security. Our scheme enables a defender to convert the spread of possible taint paths to loss and practically estimate the risk of a specific threat. Additionally, the proposed scheme not only considers the taint source, PRs and SOEs from tainted data to system vulnerability but also estimates the losses when tainted data are propagated. Consequently, the proposed method improves the defence reactions to a risk associated with the evolution of a system’s security concerns and assists defenders in making appropriate decisions when tracing possible taint sources.
Footnotes
Academic Editor: Suat Ozdemir
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported jointly by TWISC@NCKU and the Ministry of Science and Technology of Taiwan under grant nos MOST 104-2632-E-168-001, MOST 104-2218-E-001-002 and MOST 105-2410-H-168-002.
