Abstract
Network coding has shown a considerable improvement in terms of capacity and robustness compared to traditional store-and-forward transmission paradigm. However, since the intermediate nodes in network coding-enabled networks have the ability to change the packets en route, network coding-enabled networks are vulnerable to pollution attacks where a small number of polluted messages can corrupt bunches of legitimate messages. Recently, research effort has been put on schemes for protecting the transmitted messages against data pollution attacks. However, most of them cannot resist tag pollution attacks. This paper presents a new homomorphic MAC-based scheme, called Dual-Homomorphic MAC (Dual-HMAC), for network coding-enabled wireless sensor networks. The proposed scheme makes use of two types of tags (i.e., MACs and D-MACs) to provide resistance against data pollution attacks and partially tag pollution attacks. Furthermore, our proposed scheme presents low communication overhead and low computational complexity compared to other existing schemes.
1. Introduction
Wireless sensor networks (WSNs) have been recently used in a plethora of applications of many different areas, such as surveillance monitoring, environmental monitoring, traffic control, natural disaster prevention, and e-health. Due to their wide range of applications, they have already become so attractive to research community. However, WSNs are characterized by low communication bandwidth and power consumption constraints [1]. Thus, network coding (NC) can be a promising solution for WSNs, since it is a technique that can provide network capacity improvement [2] and lower energy consumption [3].
NC was introduced by Ahlswede et al. [2]. Nowadays, NC has been used in various applications over different networks, such as wireless mesh networks [4], wireless sensor networks [5], and peer-to-peer systems [6]. In contrast to traditional and classical commodity flow, where information is only routed or replicated, information flow can also employ coding operations at intermediate nodes. In [7], random linear network coding (RLNC) was studied as a fully distributed method for performing network coding. Also, in [7], it is mentioned that there is a possibility for each node in the network to select a set of coefficients independently and randomly and use them to make linear combinations of the data symbols.
However, NC is more susceptible to data pollution attack than the traditional store-and-forward approach. If a data pollution attack is not detected at the forwarders, then the sink nodes will not be able to recover the source messages correctly. It is worthwhile to mention that even a small number of polluted messages can infect a large number of downstream nodes because the pollution propagates via recoding. Furthermore, NC-enabled networks are also vulnerable to tag pollution attack, which is a more sophisticated type of pollution attack. In tag pollution attack, the adversary targets at modifying (i.e., polluting) the tags carried by the messages rather than modifying the content of the messages. It is possible for a message with polluted tags to travel multiple hops until it is detected and discarded. Yet, this results in a waste of network bandwidth.
Thus far, several cryptographic schemes such as homomorphic hash functions, signatures, and MACs have been presented to secure network coding against data and tag pollution attacks (e.g., [8, 9]). Furthermore, information-theoretic schemes have been also proposed (e.g., [10, 11]) to detect data pollution attacks. Although information-theoretic schemes are more efficient, in terms of computation, compared to the cryptographic schemes, they can only detect polluted packets at the sink nodes.
Among all the above mentioned proposed schemes, Homomorphic MAC is considered as a low-complexity solution against data pollution [12, 13]. Specifically, a MAC or tag is a small piece of information appended to the end of the message packet. This piece of information is the output of a MAC function taking as inputs the message packet and a secret key. However, this solution is vulnerable to tag pollution attacks.
In this paper, we propose a new homomorphic MAC-based scheme, called Dual-Homomorphic MAC (Dual-HMAC), for NC-enabled WSNs, in order to mitigate both data pollution attacks and partially tag pollution attacks. In our scheme, the source generates multiple MACs and Dual MACs (D-MACs) for each message. Each MAC ensures integrity of the transmitted message and each D-MAC ensures integrity of the MACs. In other words, by appending D-MACs, the proposed scheme mitigates partially tag pollution attacks.
The rest of the paper is outlined as follows. In Section 2, we give an overview of the related work on security schemes against data pollution attacks and tag pollution attacks in NC-enabled networks. In Section 3, the problem statement is discussed. In Section 4, our proposed Dual-HMAC scheme is presented. In Section 5, the security analysis of the proposed scheme takes place. In Section 6, its performance evaluation is given. In Section 7, a comparison among the communication and computation overhead of our work (i.e., Dual-HMAC) and the communication and computation overhead of the related works in [9, 14] is provided. Finally, Section 8 concludes the paper.
2. Related Work
In this section, we present related work on security schemes against pollution attacks in NC-enabled networks. In addition, we present the key distribution mechanisms used by these security schemes.
2.1. Security Schemes against Pollution Attacks in NC-Enabled Networks
Most of the research in the field of pollution attacks in network coding is focused on two types of security schemes: information-theoretic schemes [10, 11] and cryptographic schemes [14, 15].
2.1.1. Cryptographic Schemes
These schemes rely on additional verification information which is added by the sources through cryptographic techniques. This allows intermediate nodes to verify the original messages and filter out the polluted ones. This category includes the homomorphic hashing schemes, the homomorphic signature schemes, and the homomorphic MACs.
(a) Homomorphic Hashing Schemes. Homomorphic hashing schemes are based on a homomorphic hash function which is applied by the source to the messages. These schemes rely also on additional secure communication channels in order to transmit the calculated hash values to the intermediate nodes. A well-known homomorphic hashing scheme is proposed by Krohn et al. in [8] which enables the intermediate nodes to verify on-the-fly. The extension of this work was presented in [16] which is focused on network coding-enabled networks and further reduces the expensive computation cost of hash functions in each intermediate node by enabling cooperation verification.
(b) Homomorphic Signature Schemes. Homomorphic signature schemes were introduced for the first time by Charles et al. in [17] and have been based on Weil pairing over elliptic curves. The homomorphic property of the signatures in these schemes allows nodes to sign any linear combination of the incoming packets without contacting the signing entity. The calculation of signature covers the whole augmented message. In [18], Yu et al. proposed a homomorphic signature function which allows the relay nodes to verify the received messages by generating the signatures without contacting the signing entities. In Yu's scheme, the communication does not need any extra secure communication channel. Their experimental results show that its verification efficiency is improved up to 10 times compared to the other related work (i.e., CJLs scheme [17]). Moreover, in [10], an alternative lightweight scheme is proposed. It is much faster rather than Yu's scheme, but it is not as secure as the first one.
(c) Homomorphic MACs. Homomorphic MACs are based on appending some extra information to the original message. The basic definition of homomorphic MAC is defined by Agrawal and Boneh in [14] which allows the integrity checking of the network coded data. However, this idea is susceptible to tag pollution attacks. Kehdi and Baochun [19] presented a homomorphic MAC scheme to detect pollution attacks based on the subspace properties of random linear network coding, where null keys for verification are used. As a lot of null keys are used in each generation, a high bandwidth overhead can be incurred. The authors in [13] proposed an idea where the source attaches multiple MACs to each packet. This idea can be applied for XOR network coding networks; however it is vulnerable to tag pollution attacks. RIPPLE [20] was proposed to counteract against the tag pollution problem. This work is based on a symmetric key based scheme for network coding authentication. RIPPLE allows a node to efficiently detect corrupted packets and encode only the authenticated (i.e., verified) ones. However, the global synchronization among all nodes is the problem of this scheme. In [9], the authors presented a hybrid-key cryptography approach which could achieve authentication in the presence of both data and tag pollution attacks. To achieve this, a number of tags and a signature are appended to each transmitted message. However, the signature which is necessary for preventing tag pollution is a time consuming process.
2.1.2. Information-Theoretic Schemes
The information-theoretic schemes benefit from the reasonable computation performance in comparing to the cryptographic schemes, but they have two main drawbacks. First of all, they are not able to detect the polluted messages at the intermediate nodes of the network. For example, the work presented in [10], which extends the random network coding, relies on coding redundant information into the original messages allowing only the receivers to detect the polluted messages. Furthermore, the information-theoretic schemes are characterized by the limitation on the number of compromised nodes. These two drawbacks comprise the main motivation to interest to the cryptographic schemes.
2.2. Key Distribution Mechanisms in NC-Enabled Networks
Nowadays, in emerging multicast communication systems (e.g., pay TV and video-conferencing services), the key distribution is a challenging issue. There are various proposed schemes for key distribution in multicast systems. Key distribution models can be categorized in trusted-server schemes [21], public-key schemes, and key predistribution schemes [22]. The trusted-server and public-key schemes are infeasible to be used in WSNs as the sensors have limitations in terms of energy and computing processes. On the other hand, key predistribution schemes are feasible to be used in WSNs. In these schemes, the shared keys are distributed among the nodes of a group in such a way that every node in the group is able to compute individually a common key associated with that group. Two examples of key predistribution schemes are given as follows.
In MacSig which is an efficient subspace authentication scheme for NC and presented in [9], the double-random key distribution has been proposed. This proposed keys distribution model includes two random procedures. First of all, they assign each node with a random set of keys; next, the source node randomly selects a subset of keys from the key pools and uses these keys to generate the MACs. However, it is necessary to attach the indexes of these keys to each packet. Hence, a considerable bandwidth overhead is incurred.
In KEPTE (key predistribution-based tag encoding) scheme, a key distribution model for practical NC has been proposed in [23]. According to this key distribution model, a Key Distribution Center (KDC) is responsible for the distribution of the keys. The KDC allocates N secret vectors at the source node to produce N tags. Moreover, it assigns two secret vectors to each node to check the correctness of its received packets.
3. Problem Statement
In this section, we provide our problem statement through a network coding-enabled WSN model and its corresponding adversary model.
3.1. Network Coding-Enabled WSN Model
We consider a network coding-enabled WSN, as it is depicted in Figure 1, consisting of a source node S, a number of intermediate nodes, and a set of sink nodes. Random linear network coding is exploited in this network. In order to model our network, we define a triple directed multigraph G: we consider a pair of V, E as a directed acyclic graph where V and E are the node sets and edge sets of G, respectively; source node S: in our network model, we have a source S that wants to multicast its messages; non-source node R: we define relay and sink nodes in a set of nodes which is defined as:
Notations section summarizes the main notations used in this paper.

A general multicast network.
Prior to transmission, the source node partitions message packets into generations. Each generation consists of m messages packets denoted as
3.2. Adversary Model
For the above mentioned network coding-enabled model, we consider that the source is always trusted and there is no possibility to forge it, but the intermediate and sink nodes can be compromised. The adversary is able to wiretap all the data packets that are transmitted over a network. The adversary's goal is to achieve pollution attack. There are two types of pollution attack.
Data Pollution Attack. An adversary can inject fake data packets into the network. The objective of data pollution attack is to pass the verification of other innocent nodes and to cause incorrect decoding at the sink node, as well as wasting of bandwidth.
Tag Pollution Attack. An adversary injects a corrupted packet consisting of correct data but modified tags to the network. The objective of tag pollution is to discard the correct data packets due to the corresponding corrupted tags. This results in the waste of bandwidth. A basic scenario of tag pollution attack is presented in Figure 2.

A basic scenario of tag pollution. We consider a scenario which has a packet and 4 tags. We consider a node A as an adversary that pollutes the second tag
4. Proposed Scheme: Dual-Homomorphic MAC (Dual-HMAC)
In this section, we present our Dual-HMAC scheme, which mitigates data pollution and partially tags pollution attacks for a network coding-enabled WSN. The Dual-HMAC scheme's outline, the construction and the correctness of the scheme, and the key distribution model are provided in this section.
Our proposed solution is based on the homomorphic MAC solution, defined by Agrawal and Boneh [14], and the MacSig scheme proposed in [9]. Additionally, our proposed scheme makes use of additional MACs, termed as D-MACs, in order to resist data pollution and partially tag pollution attacks. In contrast to the homomorphic MAC solution in [14], our proposed scheme is not susceptible to tag pollution attacks. Particularly, our proposed scheme can achieve resistance for the half of the tags against tag pollution attack with significant low computational complexity compared to the MacSig scheme, which uses a signature scheme characterized by a much considerable computational overhead.
Our scheme consists of three steps, as it is depicted in Figure 3, in order to generate the required MACs and D-MACs. In step 1, it computes a number of MACs (i.e., tags), for each message, according to the keys which are chosen randomly from the first pool of keys. Then, in step 2, it computes a number of D-MACs for the initial MACs, according to the keys which are chosen from the second pool of keys. Finally, in step 3, the computed MACs and D-MACs are appended to the message. More specifically, the main idea of our work is based on the orthogonal subspace property [9].

The basic idea of Dual-HMAC scheme which is related to the padding orthogonality property.
4.1. The Outline of the Dual-HMAC Scheme
In this section, we provide description of the proposed Dual-HMAC scheme with a formal method. This scheme includes four steps: setup, tag generation, verification, and encoding detailed as follows.
4.1.1. Setup
Key Distribution Centre (KDC) distributes the following two sets of secrete key vectors to the source node S: Note that Note that the subscriptions l,
The detail of probabilistic key distribution algorithm will be described in Section 4.4, as well as how to choose the key set size.
4.1.2. Tag Generation
For every data packet
4.1.3. Verification
When a relay or sink node receives a coded packet
4.1.4. Encoding
When an intermediate node receives h encoded packets
4.2. The Construction
In this subsection, we provide the construction of the scheme. Particularly, we present the above-mentioned four algorithms, namely, MAC, D-MAC, Combine, and Verify, in detail. The algorithm MAC computes L tags for each data packet
4.2.1.
Input: Output: L tags
4.2.2.
Input: Output:
4.2.3.
Input: h vectors Output: a coded packet with
4.2.4.
Input: Output: if
then output 1; otherwise output 0.
4.3. The Correctness of Dual-HMAC
The proposed Dual-HMAC is going to be correct if the Verify algorithm could produce output 1 for the valid data packet and the valid RLNC encoded packet. Rigorously speaking, it should satisfy the following condition:
If If
Theorem 1.
The first verification is correct.
Proof.
According to the description of “Setup,” since
Similarly, since
As a result,
Theorem 2.
The second verification is correct.
Proof.
According to (3), the tags are created in a way that
If
4.4. Key Distribution Model
In our work, we assume that a set of symmetric keys are distributed to all participant nodes in a secure and authenticated manner through a key distribution scheme. In this section, we provide the model of the key distribution scheme that we have adopted.
A set system is a pair
Definition A.
A set system
Definition B.
A set system
5. Security Analysis
In this section, we analyse the possibilities of an adversary launching pollution attack successfully. Hence, three possible pollution attack behaviours are analysed, and their security levels are quantified, respectively. We summarize our analysis in Table 1. Finally, the possible knowledge that an adversary can access is analysed.
Security analysis.
A representation of adversary behavior. Max hops shows the maximum hops which the polluted packet can travel through the network. We consider the worst case when the adversary node can have access to all the next
5.1. The Resistance against Pollution Attack
We consider the following three possible pollution attack scenarios.
5.1.1. 1st Pollution Behaviour: Data Pollution Attack
An adversary node intends to make changes in a message. This polluted message can be detected immediately by the next hop, or in a few hops later (maximum
5.1.2. 2nd Pollution Behaviour: Tag (i.e., MAC) Pollution Attack
An adversary node intends to make changes in the MACs. This attack can be detected immediately by the next hop if he cannot compromise the next hop and get the next hop keys, or in a few hops later (maximum
5.1.3. 3rd Pollution Behaviour: Dual-Tag (i.e., D-MAC) Pollution Attack
An adversary i which is one of the N legitimate nodes (
Theorem 3.
The probability that a downstream user receives the same key as
Proof.
The probability of recovering the same shared key between two nodes is
5.2. The Possible Knowledge of an Adversary
As outlined in Section 3.2, we assume the source node is trustworthy and the process of key predistribution is secure. Hence, the secret key sets,
6. Performance Evaluation
In this section, communication and computation overhead of the proposed Dual-HMAC scheme are presented.
6.1. Communication Overhead
In our scheme, each source message has
Hence, the Dual-HMAC scheme's overhead is equal to the scheme presented in [14]. However this scheme is vulnerable to tag pollution. Additionally, the bandwidth overhead between our scheme and the scheme proposed in [9] is given by the following equation:
Base on (11), the bandwidth overhead comparison between our scheme and the scheme described in [9], for different values of c, is depicted in the graphs included in Figure 4.

The bandwidth overhead comparison. By fixing
6.2. Computation Overhead
Recall that, to append L MACs and
7. Discussion
The number of appended tags which is needed in [14] is
Moreover, the works in [9, 14] need to use
To conclude this section, consider the following.
Dual-HMAC incurs a relatively low communication overhead which is almost equal to [14]; however, our scheme's overhead, according to (11), saves up to Dual-HMAC not only slightly increases the tag pollution resistance, but also decreases the computation overhead compared to the schemes in [9, 14]. Dual-HMAC provides resistance for half of the tags against tag pollution attack. This results in reducing the possibility of tag pollution around 50% compared to the work presented in [14]. Dual-HMAC uses symmetric keys, in contrast to MacSig scheme using symmetric and public keys.
8. Conclusion
Network coding-enabled WSNs are susceptible to pollution attack where a small number of polluted messages can corrupt bunches of legitimate messages. This paper has presented a homomorphic MAC-based scheme, called Dual-HMAC, for network coding-enabled WSNs. The proposed scheme makes use of two types of tags to provide resistance against data pollution attacks and partially tag pollution attacks. The partial protection against tag pollution is related to the key predistribution model. Furthermore, our proposed scheme presents low communication overhead and low computational overhead compared to other existing schemes. Finally, as future work, we plan to propose a scheme that will mitigate not only partially tag pollution attacks but also tag pollution attacks.
Footnotes
Notations
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The research leading to these results has received funding from National Funds through FCT, Fundação para a Ciência e a Tecnologia, under the project PTDC/EEA-TEL/119228/2010 SMARTVISION, and from the European Community's Seventh Framework Programme [FP7/2007–2013] under Grant Agreement no. 285969 [CODELANCE].
