MD-POR: Multisource and Direct Repair for Network Coding-Based Proof of Retrievability

Abstract

When data owners publish their data to a cloud storage, data integrity and availability become typical problems because the cloud servers are never trusted. To address these problems, researchers proposed the Proof of Retrievability (POR) protocol which allows a verifier to check and repair the data stored in the cloud servers. Based on the POR protocol, the network coding technique is commonly applied to increase the efficiency in data transmission and data repair. However, most previous schemes neither consider a practical scenario nor use the network coding efficiently. In this paper, a lightweight network coding-based POR scheme, called MD-POR (Multisource and Direct Repair for Proof of Retrievability) is proposed. Unlike previous schemes, the proposed MD-POR scheme allows multiple clients who have different secret keys to participate in the scheme. Moreover, the MD-POR scheme supports the direct repair feature in which a corrupted data can be recovered by the servers without burdening the clients. The MD-POR scheme also supports public authentication feature in which a third party auditor is employed to check the servers, and the client is thus free of the responsibility of periodically checking the servers. Furthermore, the MD-POR scheme is constructed based on a symmetric key setting.

1. Introduction

Since data is increasing exponentially, database owners trend to publish their data to storage providers called clouds in order to reduce the burden of data storage and maintenance. Clients can thus access, manage, and share their data from anywhere via the Internet. However, such service providers are untrustworthy and present three basic challenges to data security: (i) integrity, (ii) availability, and (iii) confidentiality. In confidentiality, there are two research approaches: the cryptographic approach (e.g., RSA) and the information-theoretic approach (e.g., secret sharing scheme). Compared to the cryptographic confidentiality approach, the information-theoretic confidentiality approach achieves a security level determined by a threshold. We choose the information-theoretic approach because our security analysis derives purely from information theory. In this paper, we deal with integrity, availability and information-theoretic confidentiality.

To check the cloud servers, researchers proposed the Proof of Retrievability (POR) protocol [1–3] that enables the servers (provers) to demonstrate to the verifier whether the data stored in the servers is intact and available and enables the clients to recover the data when an error is detected. Based on the POR protocol, the integrity and availability assurance are mainly based on three techniques: replication [4], erasure coding [5], and network coding [6–9]. In the replication technique, the client stores file replicas in each server. When a corrupted server is detected, the client uses one of the healthy replicas to repair it. However, the drawback of this technique is high storage cost because the client must store a whole file in each server. Erasure coding technique is then applied to reduce the storage cost. Erasure coding allows the client to store file blocks in each server redundantly instead of file replica as replication. However, when the corrupted data is repaired, the client has to retrieve the entire original file before the client generates new coded blocks. Therefore, its computation and communication costs are increased during data repair. Network coding technique is then applied to improve the efficiency in the data repair. The main advantage of network coding is that the client does not need to retrieve the entire file before the client generates new coded blocks. Consequently, in this paper, we focus on the network coding technique. Our goal is to construct a network-coding POR which satisfies the following aims. (i)

Practical scenario: the system should consist of multiple clients, each client keeps a different secret key. This is because in many distributed storage systems today such as Dropbox, each client has a personal data; and hence, each client should use his own secret key to satisfy integrity and confidentiality.

(ii)

Lightweight: firstly, the clients should be free of two heaviest tasks: periodically checking the servers and repairing the corrupted servers. Secondly, the system should be constructed in a symmetric key setting which is a well-known lightweight cryptography rather than an asymmetric key setting.

Network Coding-Based POR Schemes. A few notable network-coding PORs were proposed. Dimakis et al. [10] were the first applying network coding to the distributed storage system. Li et al. [11] proposes a tree-structure data regeneration for the network coding to optimize network bandwidth by using a maximum spanning tree. Chen et al. [12] then adapted the scheme of Dimakis et al. to propose the Remote Data Checking for Network Coding-based distributed storage system (RDC-NC) scheme which provides an elegant data repair by recoding encoded blocks in healthy servers during repair. Cao et al. [13] applied the Luby transform (LT) code for reducing the computation cost because the LT code is a special network code which works in the finite field of order two and only uses exclusive-OR (XOR) operation. Chen et al. [14] proposed the NC-Cloud scheme to improve the cost-effectiveness of repair using the functional minimum storage regenerating (FMSR) code and lighten the encoding requirement of storage nodes during repair. However, all these schemes cannot hold our aims. These system models only have a single client. Furthermore, the check and repair phases in these schemes bring a lot of burden to the client because (i) the client has to periodically check the servers and (ii) when a corrupted server is detected, the healthy servers provide their blocks to the client; the client then has to verifies them, computes the new blocks, and sends these new blocks to the new server. Le and Markopoulou after that proposed the NC-Audit scheme [15] in which a third party auditor is employed and is delegated the responsibility to check the servers instead of the client. The authors also discussed a new repair mechanism in which the new server can compute the new blocks by itself without the need of the client. We call that mechanism as direct repair. Unfortunately, their direct repair is not completed because they mainly focused on how to prevent the data leakage from the third party auditor. Furthermore, their scheme is constructed in an asymmetric key setting and does not deal with multiple clients.

Contribution. In this paper, a new network-coding POR named as MD-POR is proposed. To the best of our knowledge, we are the first to propose a symmetric key setting-based direct repair for the POR; furthermore, the proposed MD-POR scheme also supports multiclient and public authentication. (i)

Direct Repair. If a corrupted server is detected, the healthy servers are required to provide their coded blocks directly to the new server instead of sending these coded blocks back to the client. Afterwards, the new server verifies the coded blocks it received and computes the new coded blocks for itself without disturbing the client. This mechanism can reduce the communication cost and the burden for the client.

(ii)

Multiclient. To enable multiple clients, our method does not simply duplicate the process of a single client to multiple parallel processes for multiple clients. Instead, in the proposed MD-POR scheme, the processes of multiple clients are mixed together without loosing the data confidentiality of individual clients. To enable such a multiclient setting, we employ the InterMac technique [16] which was proposed for network scenario. The InterMac technique allows multiple sources to send their packages to the network using different secret keys and allows the recipients to verify the packages they received.

(iii)

Symmetric Key Setting. The MD-POR scheme uses only secret keys without any public key, unlike an asymmetric key setting.

(iv)

Public Authentication. Not only the client but also any entity who has a given information can check the cloud servers while learning nothing about the secret key of each client. We employ a third party auditor (TPA) on behalf of the clients to check the servers periodically. By delegating the responsibility of checking the servers to the TPA, the clients are free of the burden of checking the servers. Otherwise, for the nonexistence of TPA, the clients have to periodically check the servers, and the public authentication feature cannot be supported because only the clients can check the servers. Although the MD-POR scheme supports public authentication, our method does not use an asymmetric key setting.

Organization. The system model, the backgrounds of the Proof of Retrievability, the network coding technique, the InterMac technique, the notations, and definitions are described in Section 2. The adversarial model is presented in Section 3. The MD-POR scheme is proposed in Section 4. The security analysis and efficiency analysis are given in Section 5 and Section 6, respectively. The performance evaluation of the MD-POR scheme is shown in Section 7. The conclusion and future work are drawn in Section 8.

2. Preliminaries

2.1. System Model

The system model of the MD-POR scheme is depicted in Figure 1. There are three types of entities. (i)

Clients: these entities have data to be stored in the cloud and rely on the cloud for data storage, computation, and maintenance. These clients can be either enterprises or individual customers.

(ii)

Cloud servers: the cloud servers are managed and monitored by a cloud service provider to accommodate a service of data storage and have significant and unlimited storage space and computation resources. In the cloud storage service, the clients can store their data into a set of servers in a simultaneous and distributed manner.

(iii)

Third party auditor (TPA): this entity is delegated the responsibility to check the servers on behalf of the clients. The TPA is assumed to be trusted to perform the task of periodically checking the servers.

Figure 1

System model.

Originally, the system model which consists of only the client and the servers without the TPA is enough for data check. To enable the public authentication feature, the TPA is employed with an assumption that the TPA is a honest-but-curious entity. Several previous papers also use the same assumption of the TPA, for example, [15, 17–19].

2.2. Proof of Retrievability (POR)

To check the servers, researchers proposed the Proof of Retrievability (POR) [1–3] which is a challenge-response protocol between a verifier (client) and a prover (server). The POR has four phases as follows. (1)

keygen $(1^{λ})$ : given a security parameter λ, the client runs this algorithm to generate a secret key ( $s k$ ) and a public key ( $p k$ ). For the symmetric key setting, $p k$ is set to be $n u l l$ .

(2)

encode $(s k, F)$ : the client runs this algorithm to encode an original file (F) to an encoded file ( $F^{'}$ ) and then sends $F^{'}$ to the server to store.

(3)

$c h e c k (s k)$ : the client uses his secret key $s k$ to generate a challenge (c) and sends c to the server. The server then computes a response (r) and sends r back to the client. Finally, the client verifies whether the file F is intact based on c and r.

(4)

repair(): the client runs this algorithm only when a failure is detected in the check phase. The technique of the repair phase depends on each specific technique, for example, replication, erasure coding, or network coding.

To be suitable for our system model, we modify the POR such that the verifier is the TPA and there are multiple clients as follows. (1)

keygen $(1^{λ})$ : given a security parameter λ, the algorithm generates a set of secret keys ${{s k}_{i}}_{i \in {1, \dots, s}}$ for s clients and a secret key κ for the TPA.

(2)

encode $({s k}_{i}, F_{i})$ : each client i uses his secret key ${s k}_{i}$ to encode his original file $F_{i}$ to an encoded file $F_{i}^{'}$ and then sends $F_{i}^{'}$ to the servers. Each server then linearly combines all $F_{i}^{'}$ ( $i \in {1, \dots, s}$ ) and stores the combined blocks.

(3)

check $(κ)$ : the TPA uses his key κ to generate a challenge c and sends c to the servers. Each server then computes a response r and sends r back to the TPA. Finally, the TPA verifies whether each $F_{i}$ is intact or not.

(4)

repair(): this algorithm is executed when a failure is detected in the check phase. The technique of the repair phase depends on each specific scheme.

2.3. Network Coding

Network coding [6–9] is commonly used in network transmission to obtain a good trade-off in term of bandwidth and data repair. Network coding is proposed firstly for the network scenario. It then is applied to the distributed storage system scenario.

Fundamental Concept. In the network scenario, suppose that a source node C wants to send its message to a receiver node R. Before transmitting, C breaks the message into m blocks $v_{1}, \dots, v_{m}$ ; each file block belongs to $F_{q}^{n}$ where $F_{q}^{n}$ denotes a n-dimensional vector space over a finite field $F$ with a prime q. C augments each file block $v_{i}$ ( $i \in {1, \dots, m}$ ) with a vector of length m in which a single “1” is in the ith position and “0's are elsewhere. Let $w_{1}, \dots, w_{m}$ be the augmented blocks. Each augmented block has the following form:

\begin{matrix} w_{i} = (v_{i}, \overset{m}{\overset{︷}{\underset{i}{\underset{︸}{0, \dots, 0,1}}, 0, \dots, 0}}) \in F_{q}^{n + m} . \end{matrix}

(1)

These augmented blocks are then sent as packets to the network. When an intermediate node I in the network receives t packets, I will generates t coefficients, linearly combines t packets using the generated coefficients, and transmits the result to its adjacent nodes. Consequently, the receiver node R can receive combinations of all augmented blocks. R can recover m augmented blocks using any set of m combinations. Suppose that R receives m packages $y_{1}, \dots, y_{m} \in F_{q}^{n + m}$ , and R solves all m augmented blocks $w_{1}, \dots, w_{m} \in F_{q}^{n + m}$ using the accumulated coefficients which are contained in the last m coordinates of each package y. Afterwards, the file blocks $v_{1}, \dots, v_{m}$ can be obtained from the first coordinate of each augmented block. Finally, the original message can be reconstructed by concatenating all file blocks.

Application in Distributed Storage System. In the network scenario as described above, there are multiple types of entities: source node, intermediate nodes, and receiver node. However, when the network coding is applied to the distributed storage system scenario, there are two types of entities: a client and servers. Suppose that a client has the original file F which consists of m file blocks ( $v_{1}, \dots, v_{m}$ ). The client wants to store redundantly encoded blocks in the servers in a way that the client can reconstruct the original file F and can repair the encoded blocks in a corrupted server. From these file blocks, the client firstly creates m augmented blocks ( $w_{1}, \dots, w_{m}$ ). The client then chooses m coding coefficients ( $α_{1}, \dots, α_{m} \in F_{q}$ ) and computes coded blocks using the linear combination as $c = \sum_{i = 1}^{m} ‍ α_{i} \cdot w_{i}$ and then stores these coded blocks in the servers. To reconstruct the original file F, any m coded blocks are required to solve m augmented blocks $w_{1}, \dots, w_{m}$ using the accumulated coefficients contained in the last m coordinates of each coded block. After these m augmented blocks are solved, m file blocks $v_{1}, \dots, v_{m}$ are obtained from the first coordinate of each augmented block. Finally, the original file F is reconstructed by concatenating the file blocks. Note that the matrix consisting of the coefficients used to construct any m coded blocks should have full rank. Koetter and Medard [20] proved that if the prime q is chosen large enough and the coefficients are chosen randomly, the probability for the matrix having full rank is high. Once a corrupted server is detected, the client repairs it as follows: the client retrieves coded blocks from the healthy servers and linearly combines them to regenerate new coded blocks. An example about the data repair of network coding is given in Figure 2.

Figure 2

From three augmented blocks ${w_{1}, w_{2}, w_{3}}$ , the client computes six coded blocks and stores two coded blocks in each of servers $S_{1}$ , $S_{2}$ , $S_{3}$ . Suppose that $S_{3}$ is corrupted, the client requires $S_{1}$ and $S_{2}$ to create new blocks using linear combination, and then the client mixes them using linear combination to obtain two new coded blocks and stores them in the new server.

2.4. InterMac

Before describing how the InterMac works, we explain why it is used in our proposed MD-POR scheme as follows. We consider a network in which multiple sources are simultaneously supported and each source owns a different secret key. The data of each source cannot be checked alone. Instead, each source uses the secret key to compute an additional information which is Message Authentication Code (MAC) for each data block. A MAC is also called as tag. Each source then transmits the packets consisting of the data blocks and the corresponding tags to the next adjacent node in the network. A node in the network will linearly combine the received blocks and the homomorphic tags. Herein lies the difficulty of the task: when a recipient node receives a packet, how can this node verify the received linear blocks based on the linear homomorphic tags without any information about any of the secret keys. The traditional methods, that is, MAC or HMAC, are inadequate to solve this task. Some recent schemes related to this problem have been proposed, for example, [21–23]; unfortunately, they all use an asymmetric key setting, which is not our aim.

The InterMac technique [16] is a suitable technique to generate such secret keys for multiple sources. The characteristic of this technique is that the key of the source $C_{p}$ ( $p \in {1, \dots, s}$ where s denotes the number of sources) is orthogonal to all the augmented blocks which do not belong to $C_{p}$ . This characteristic can help the verifier check the received packets without needing the information on any of the secret keys.

Construction. Let $w_{11}, \dots, w_{s g} \in F_{q}^{n + m}$ be the augmented blocks that have span π, and let them represent as row vectors (where s denotes the number of sources, g denotes the number of file blocks per source, and $m = s \cdot g$ ). For each $p \in {1, \dots, s}$ , let $M_{p}$ be the matrix whose rows are vectors in the following set:

\begin{matrix} \{w_{i j} | i = 1, \dots, s; i \neq p; j = 1, \dots, g\} . \end{matrix}

(2)

In other words, $M_{p}$ is the matrix consisting of the augmented blocks of all other sources except $C_{p}$ . $r a n k (M_{p}) = m - g$ . Let $π_{M_{p}}$ denote the space spanned by the rows of $M_{p}$ .

The null space of $M_{p}$ , denoted as $π_{M_{p}}^{⊥}$ , is the set of all row vectors $z \in F_{q}^{n + m}$ for which $M_{p} z^{T} = 0$ . For any $(m - g) \times (n + m)$ matrix $M_{p}$ , we have

\begin{matrix} r a n k (M_{p}) + n u l l i t y (M_{p}) = n + m \end{matrix}

(3)

known as rank-nullity theorem, where

n u l l i t y (M_{p})

is the dimension of

π_{M_{p}}^{⊥}

. Hence,

\begin{matrix} d i m (π_{M_{p}}^{⊥}) = n + m - (m - g) = n + g . \end{matrix}

(4)

Let $b_{1}, \dots, b_{n + g} \in F_{q}^{n + m}$ be a basis of $π_{M_{p}}^{⊥}$ . This basis can be found by solving $M_{p} z^{T} = 0$ . Let F be a pseudorandom function (PRF): $\bar{K} \times ([1, s] \times [1, n + g]) \to F_{q}$ . A key $k_{p}$ for the source $C_{p}$ is computed as follows: (i)

$r_{i} \leftarrow F (k, p, i) \in F_{q}, \forall i \in {1, \dots, n + g}$ ;

(ii)

$k_{p} \leftarrow \sum_{i = 1}^{n + g} ‍ r_{i} b_{i} \in F_{q}^{n + m}$ .

Eventually, a key set ${k_{1}, \dots, k_{s}}$ is generated in which each key $k_{p}$ where $p \in {1, \dots, s}$ is constructed as above.

2.5. Notations and Definitions

Throughout this paper, the list of notations and definitions is given in Notation section.

3. Adversarial Model

In the MD-POR scheme, only the clients are trusted because they are the data owners. The following entities are untrusted and considered to be adversaries: (i)

attackers outside the system;

(ii)

the cloud servers in the system;

(iii)

the TPA in the system (the TPA is assumed not to collude with the servers. We explained about this assumption in Section 2.1).

Concretely, the adversaries can perform the following the attacks.

3.1. Mobile Attack

This attack is performed by an adversary $A$ outside the system. $A$ potentially corrupts all the servers across the full system lifetime. A restriction on $A$ is that he/she can control only ( $h - l$ ) out of h servers in any given time step. Let epoch denote a given time step. In each epoch, the servers are checked. If a corruption is detected on a certain server, the blocks stored in that corrupted server will be repaired from redundancy in the intact servers. Without the server checks, the adversary $A$ can corrupt all the servers of the system in $h / (h - l)$ epochs.

3.2. Curious Adversary

This attack is performed by the TPA or a new server. In the check phase, the TPA is given a key κ which is constructed from all the secret keys of the clients. In the repair phase, a new server is given another key $κ^{'}$ which is also constructed from all the secret keys of the clients. When they are given their keys, these adversaries try to learn the secret keys because once all secret keys are obtained, these adversaries can fake a valid response when they are checked.

3.3. Response Forgery

This forgery is performed by the servers. In the check phase, the verifier checks all the servers to ensure that they are not corrupted. Each server has to send a response to the verifier in order to demonstrate that the server is healthy. However, a checked server may forge the response to deceive the verifier. If the forged response from the adversarial server satisfies the verification, that server can pass the check phase.

3.4. Pollution Attack

This attack is performed by the servers. The purpose of this attack is to break the linear independence of the encoded blocks. In a network, if a node is malicious and forward invalid package, receivers then obtain multiple packets and cannot tell which of their received packets are corrupt. In other words, the purpose of this attack is to inject invalid packets to prevent data recover. In the POR, this attack happens when a malicious server uses correct data to pass the check phase but then provides invalid data in the repair phase. For example, the client encodes the augmented blocks $w_{1}$ , $w_{2}$ , and $w_{3}$ to six coded blocks: $c_{11}$ , $c_{12}$ (stored in the server $S_{1}$ ), $c_{21}$ , $c_{22}$ (stored in the server $S_{2}$ ), and $c_{31}, c_{32}$ (stored in the server $S_{3}$ ). In the check phase, suppose that $S_{3}$ is detected as being corrupted. Then, in the repair phase, $S_{3}$ should be repaired using two coded blocks: $c_{31}^{'}$ (which is a linear combination of $c_{11}$ and $c_{12}$ ) and $c_{32}^{'}$ (which is a linear combination of $c_{21}$ and $c_{22}$ ). However, at this time, $S_{1}$ is malicious without being detected because this time is the repair phase, not the check phase any more. The client still thinks $S_{1}$ is healthy; thus, to recover $S_{3}$ , the client requests coded blocks from $S_{1}$ and $S_{2}$ but $S_{1}$ will provide an invalid coded blocks $c_{31}^{''}$ to the client instead of $c_{31}^{'}$ .

4. The Proposed MD-POR Scheme

Before describing the proposed MD-POR scheme in detail, the technical roadmap is depicted in Figure 3. The file blocks are used to generate the augmented blocks. Then, the augmented blocks are combined with random values to compute the keys. Meanwhile, the augmented blocks are linearly combined into the coded blocks using the network coding. Finally, the coded blocks are tagged using the keys. The coded blocks and the tags are the outputs. The network coding is used because it is related to the repair feature (Section 2.3). The InterMac is used because it is related to the multiuser feature (Section 2.4). Both the network coding and the InterMac are constructed based on linear combinations; therefore, they are suitable to combine together in the proposed scheme.

Figure 3

Technical roadmap.

Let $C_{1}, \dots, C_{s}$ be the set of s clients. Each client $C_{i}$ ( $i \in {1, \dots, s}$ ) keeps a secret key $k_{i}$ and has a file $F_{i} = (v_{i 1}, \dots, v_{i g})$ where g is the number of file blocks. Each file block $v_{i j} \in F_{q}^{n}$ ( $j \in {1, \dots, g})$ . $C_{i}$ creates g augmented blocks ( $w_{i 1}, \dots, w_{i g}$ ) from g file blocks ( $v_{i 1}, \dots, v_{i g}$ ). Each augmented block $w_{i j}$ has the form as in [16]

\begin{matrix} w_{i j} = (v_{i j}, \underset{m = s g}{\underset{︸}{\underset{g (i - 1)}{\underset{︸}{0, \dots, 0}}, \underset{g}{\underset{︸}{\overset{j}{\overset{︷}{0, \dots, 0,1}}, 0, \dots, 0}}, \underset{g (s - i)}{\underset{︸}{0, \dots, 0}}}}) \in F_{q}^{n + m}, \end{matrix}

(5)

where

i \in {1, \dots, s}

j \in {1, \dots, g}

, and

m = s g

Each client $C_{i}$ uses his secret key $k_{i}$ to compute the tag $t_{i j}$ for each augmented blocks $w_{i j}$ . The augmented blocks and the tags are then linearly combined and transmitted to all the servers. In every epoch, when the servers are checked by the TPA, the servers have to combine the coded blocks and the tags again and send them back to the TPA. The TPA can finally verify the aggregated coded blocks and the tags even though the TPA does not know any secret key ${k_{i}}$ ( $i \in {1, \dots, s}$ ).

The proposed MD-POR scheme is now described in detail via each phase of the POR as follows.

4.1. Keygen

4.1.1. Keys for the Clients (Keygen1)

Each key $k_{p}$ of the client $C_{p}$ ( $p \in {1, \dots, s}$ ) is constructed in such a way that $k_{p}$ is orthogonal to all the augmented blocks which do not belong to $C_{p}$ . In a formal statement, $k_{p}$ is constructed as

\begin{matrix} \forall i \in \{1, \dots, s\}, i \neq p, p \in \{1, \dots, s\}, w_{i j} \cdot k_{p} = 0 . \end{matrix}

(6)

Using the InterMac (Section 2.4), a key set ${k_{1}, \dots, k_{s}}$ is created. Then, each $k_{p} \in F_{q}^{n + m}$ is assigned to the client $C_{p}$ as the secret key, and the sum of all the keys $κ = k_{1} + \dots + k_{s} \in F_{q}^{n + m}$ are assigned to the TPA via a secure channel. The security of the secret keys will be proved later.

4.1.2. Dynamic Keys for a New Server (Keygen2)

When a repair phase is executed, the new server will be given a key $κ^{'} = (k_{1} + \dots + k_{s}) + k_{r e p a i r} = κ + k_{r e p a i r}$ . The new server will use the key $κ^{'}$ to check pollution attack during the repair phase. κ is already computed in Keygen1. Only $k_{r e p a i r}$ is different in each repair time. This is to ensure that an adversary cannot attack the new server to obtain $k_{r e p a i r}$ for passing the pollution attack check in the later repair phases (we thereafter explain in Section 5.4). When $k_{r e p a i r}$ is constructed in the first time, the basis of $b_{1}, \dots, b_{n + g}$ is computed and saved for the later times. In the next repair times, the basis will be reused to save the computation cost, and only the random coefficients $r_{i}$ are regenerated again to compute $k_{p}$ .

$r_{r e p a i r}$ has to be orthogonal to all augmented blocks of all the clients. Keygen2 is quite similar to Keygen1. However, the different thing is that $p \notin {1, \dots, s}$ , p is randomly chosen in $F_{q}$ such that $p > s$ in every repair time. Since $r_{r e p a i r}$ is orthogonal to all augmented blocks of all the clients, $M_{p}$ is now the matrix consisting of all the augmented blocks of all the clients. Put differently, the rows of $M_{p}$ are vectors in the following set:

\begin{matrix} \{w_{i j} | i = 1, \dots, s; j = 1, \dots, g\} . \end{matrix}

(7)

The set consists of $m = s g$ augmented blocks and each augmented block belongs to $F_{q}^{n + m}$ . For the $m \times (n + m)$ matrix $M_{p}$ , the rank-nullity theorem yields

\begin{matrix} r a n k (M_{p}) + n u l l i t y (M_{p}) = n + m . \end{matrix}

(8)

Since $r a n k (M_{p}) = m$ , the $n u l l i t y (M_{p})$ is $n u l l i t y (M_{p}) = n + m - m = n$ . The basis of the null space of $M_{p}$ is now ${b_{1}, \dots, b_{n}}$ . Let $F^{'}$ be another PRF: $\bar{K} \times (P \times [1, n]) \to F_{q}$ , where $P$ denotes the domain of p's space. The following steps are used to generate the key $k_{p}$ : (i)

$r_{i} \leftarrow F (k, p, i) \in F_{q}, \forall i \in {1, \dots, n}$ ;

(ii)

$k_{p} \leftarrow \sum_{i = 1}^{n} ‍ r_{i} b_{i} \in F_{q}^{n + m}$ .

Let $k_{r e p a i r}$ denote $k_{p}$ (to distinguish with the notation $k_{p}$ from the Keygen1). The Keygen2 is only executed and $κ^{'} = κ + k_{r e p a i r}$ is given to a new server only if a repair phase happens. The key κ is already computed in the Keygen1 as a static information, and only $k_{r e p a i r}$ is different in each repair time.

4.2. Encode

Step 1.

Each client $C_{i} (i \in {1, \dots, s})$ computes g tags for g augmented blocks:

\begin{matrix} \forall i \in \{1, \dots, s\}, \forall j \in \{1, \dots, g\} : t_{i j} = w_{i j} \cdot k_{i} . \end{matrix}

(9)

Step 2.

Each client $C_{i} (i \in {1, \dots, s})$ linearly combines the augmented blocks and the corresponding tags:

$\forall i \in {1, \dots, s}$ : (i)

$\forall j \in {1, \dots, g}$ , generate g coefficients: $α_{i j} \overset{r a n d}{\leftarrow} F_{q}$ ;

(ii)

compute coded block: $w_{C_{i}} = \sum_{j = 1}^{g} ‍ α_{i j} \cdot w_{i j}$ ;

(iii)

compute tag: $t_{C_{i}} = \sum_{j = 1}^{g} ‍ α_{i j} \cdot t_{i j}$ .

Step 3.

Each client $C_{i}$ sends the pair of $(w_{C_{i}}, t_{C_{i}})$ to all h servers ( $S_{1}, \dots, S_{h}$ ). Each server $S_{x}$ where $x \in {1, \dots, h}$ creates d pairs of coded block $c_{x y}$ and corresponding tag $t_{x y}$ where $y \in {1, \dots, d}$ :

$\forall x \in {1, \dots, h}, y \in {1, \dots, d}$ , $S_{x}$ computes: (i)

$\forall i \in {1, \dots, s}$ , generate s coefficients: $β_{x y i} \overset{r a n d}{\leftarrow} F_{q}$ ;

(ii)

compute coded block: $c_{x y} = \sum_{i = 1}^{s} ‍ β_{x y i} \cdot w_{C_{i}}$ ;

(iii)

compute tag: $t_{x y} = \sum_{i = 1}^{s} ‍ β_{x y i} \cdot t_{C_{i}}$ .

4.3. Check

The TPA is assigned the check responsibility. The TPA uses the key $κ = k_{1} + \dots + k_{s}$ to check h servers periodically. Note that the TPA is only given the sum κ without learning each component $k_{i}$ where $i = {1, \dots, s}$ . Assume that the TPA does not collude with any server:

$\forall x \in {1, \dots, h}$ : (i)

$S_{x}$ computes:

(a)

$\forall y \in {1, \dots, d}$ ; generate d coefficients $γ_{x y} \overset{r a n d}{\leftarrow} F_{q}$ ;

(b)

combine coded blocks: $c_{x} = \sum_{y = 1}^{d} ‍ c_{x y} \cdot γ_{x y}$ ;

(c)

combine tags: $t_{x} = \sum_{y = 1}^{d} ‍ t_{x y} \cdot γ_{x y}$ ;

(ii)

$S_{x}$ sends ${c_{x}, t_{x}}$ to the TPA;

(iii)

TPA computes $t_{x}^{'} = c_{x} \cdot κ$ ;

(iv)

TPA verifies: $t_{x} = t_{x}^{'}$ (∗), and then returns $t r u e$ (this means that $S_{x}$ is healthy), otherwise returns $f a l s e$ .

Correctness of the Verification (∗). Consider

\begin{array}{l} t_{x} = \sum_{y = 1}^{d} ‍ t_{x y} \cdot γ_{x y} \\ = \sum_{y = 1}^{d} ‍ \sum_{i = 1}^{s} ‍ γ_{x y} β_{x y i} t_{C_{i}} \\ = \sum_{y = 1}^{d} ‍ \sum_{i = 1}^{s} ‍ \sum_{j = 1}^{g} ‍ γ_{x y} β_{x y i} α_{i j} t_{i j} \\ = \sum_{y = 1}^{d} ‍ \sum_{i = 1}^{s} ‍ \sum_{j = 1}^{g} ‍ γ_{x y} β_{x y i} α_{i j} w_{i j} k_{i}, \\ t_{x}^{'} = c_{x} \cdot κ \\ = \sum_{y = 1}^{d} ‍ γ_{x y} c_{x y} (k_{1} + \dots + k_{s}) \\ = \sum_{y = 1}^{d} ‍ \sum_{i = 1}^{s} ‍ γ_{x y} β_{x y i} w_{C_{i}} (k_{1} + \dots + k_{s}) \\ = \sum_{y = 1}^{d} ‍ \sum_{i = 1}^{s} ‍ \sum_{j = 1}^{g} ‍ γ_{x y} β_{x y i} α_{i j} w_{i j} (k_{1} + \dots + k_{s}) . \end{array}

(10)

As described in Section 4.1.1 (Keygen 1), the property of $k_{p}$ is that $\forall i \in {1, \dots, s}$ , $i \neq p$ , $p \in {1, \dots, s}$ , $w_{i j} \cdot k_{p} = 0$ . As a result, $t_{x}^{'} = \sum_{y = 1}^{d} ‍ \sum_{i = 1}^{s} ‍ \sum_{j = 1}^{g} ‍ γ_{x y} β_{x y i} α_{i j} w_{i j} k_{i}$ . Therefore, $t_{x} = t_{x}^{'}$ .

4.4. Repair

Suppose that the server $S_{r}$ is detected as corrupted in the check phase. $S_{r}$ is replaced by a new server $S_{r}^{'}$ . The server $S_{r}^{'}$ requires l healthy servers $S_{x_{1}}, \dots, S_{x_{l}}$ to provide their combined packets consisting of the coded blocks and the tags. $S_{r}^{'}$ is given the key $κ^{'} = κ + k_{r e p a i r}$ , where $k_{r e p a i r}$ is generated from the Keygen2, to check the provided packets.

Step 1.

Each server $S_{x}$ where $x \in {x_{1}, \dots, x_{l}}$ linearly combines its d coded blocks and linearly combines its d tags. $S_{x}$ then sends the aggregated coded block and aggregated tag to the new server $S_{r}^{'}$ :

$\forall x \in {x_{1}, \dots, x_{l}}$ , $S_{x}$ performs: (i)

$\forall y \in {1, \dots, d}$ , generates d coefficients $γ_{x y} \overset{r a n d}{\leftarrow} F_{q}$ ;

(ii)

combine coded blocks: $c_{x} = \sum_{y = 1}^{d} ‍ c_{x y} \cdot γ_{x y}$ ;

(iii)

combine tags: $t_{x} = \sum_{y = 1}^{d} ‍ t_{x y} \cdot γ_{x y}$ ;

(iv)

send the package consisting of ${c_{x}, t_{x}}$ to $S_{r}^{'}$ .

Step 2.

The new server $S_{r}^{'}$ checks whether each server $S_{x}$ where $x \in {x_{1}, \dots, x_{l}}$ provides a valid packet (pollution attack), using the key $κ^{'} = (k_{1} + \dots + k_{s}) + k_{r e p a i r}$ .

Given $κ^{'} = (k_{1} + \dots + k_{s}) + k_{r e p a i r}$ , $S_{r}^{'}$ computes: (i)

compute $t_{x}^{'} = c_{x} \cdot κ^{'}$ ;

(ii)

check $t_{x} = t_{x}^{'}$ ( $* *$ ).

Step 3.

The new server $S_{r}^{'}$ computes d coded blocks and d tags for itself:

$\forall y \in {1, \dots, d}$ , $S_{r}^{'}$ computes: (i)

$\forall x \in {x_{1}, \dots, x_{l}}$ , generate l coefficients $θ_{x y} \overset{r a n d}{\leftarrow} F_{q}$ ;

(ii)

new coded blocks $c_{r y} = \sum_{x = x_{1}}^{x_{l}} ‍ c_{x} \cdot θ_{x y}$ ;

(iii)

new tags $t_{r y} = \sum_{x = x_{1}}^{x_{l}} ‍ t_{x} \cdot θ_{x y}$ .

Correctness of the Verification ( $* *$ ) in Step 2. The way to prove the correctness of this verification is similar to the correctness of the verification (∗) in the check phase. The only different thing is that not only $k_{1}, \dots, k_{s}$ but also $k_{r e p a i r}$ participates in combining the coded blocks and homomorphic tags. As described in the Keygen2 (Section 4.1.2), $\forall i \in {1, \dots, s}, p \overset{r a n d}{\leftarrow} F_{q}, p > s$ , and we have $w_{i j} \cdot k_{p} = 0$ .

5. Security Analysis

5.1. Security against Mobile Adversaries

To prevent mobile adversaries, a data repair threshold is given as follows.

Theorem 1.

The original files $F_{1}, \dots, F_{s}$ of the clients can be recovered if in any epoch, at least l out of h servers collectively store $m = s g$ coded blocks which are linearly independent combinations of m original file blocks; and the matrix consisting of the accumulated coefficients has full rank (i.e., rank m).

Proof.

Each server $S_{x}$ where $x \in {1, \dots, h}$ contains d coded blocks: ${c_{x y}} (y \in {1, \dots, d})$ . Each coded block $c_{x y}$ is computed from $m = s g$ augmented blocks $w_{i j}$ ( $i \in {1, \dots, s}, j \in {1, \dots, g}$ ) as $c_{x y} = \sum_{i = 1}^{s} ‍ \sum_{j = 1}^{g} ‍ β_{x y i} \cdot α_{i j} \cdot w_{i j}$ . To recover the original files, m augmented blocks ( $w_{11}, \dots, w_{1 g}, \dots, w_{s 1}, \dots, w_{s g}$ ) are viewed as the variables that need to be solved. To solve such m variables, at least m coded blocks are needed such that the coefficient matrix has full rank because the number of variables in an equation system has to be less than or equal to the number of independent equations:

\begin{matrix} c_{x y_{1}} = \sum_{i = 1}^{s} ‍ \sum_{j = 1}^{g} ‍ β_{x y i_{1}} \cdot α_{i j_{1}} \cdot w_{i j} \\ c_{x y_{2}} = \sum_{i = 1}^{s} ‍ \sum_{j = 1}^{g} ‍ β_{x y i_{2}} \cdot α_{i j_{2}} \cdot w_{i j} \\ ⋮ \\ c_{x y_{m}} = \sum_{i = 1}^{s} ‍ \sum_{j = 1}^{g} ‍ β_{x y i_{m}} \cdot α_{i j_{m}} \cdot w_{i j} . \end{matrix}

(11)

Therefore, at least l servers which collectively store

m = s \cdot g

coded blocks in each epoch are required.

⌈m / d⌉ \leq l < h

5.2. Security against Curious Adversaries

The following theorem gives the probability of the adversary to recover the secret keys and shows that the probability is negligible.

Theorem 2.

The secret keys of the clients are secured from the TPA and the new server.

Proof.

The TPA checks h servers ( $S_{1}, \dots, S_{h}$ ) in the check phase using the key $κ = k_{1} + \dots + k_{s}$ . Similarly, the new servers $S_{r}^{'}$ check l healthy servers in the repair phase using the key $κ^{'} = (k_{1} + \dots + k_{s}) + k_{r e p a i r}$ . The problem of security is now the problem of solving s variables (in the case of the TPA) and $s + 1$ variables (in the case of $S_{r}^{'}$ ) given one equation. The only method to solve these variables is to try all possible variable sets and test whether they satisfy this equation by using trial-and-error method with brute-force search. Let $K$ denote the key space. Each $k_{i} (i \in {1, \dots, s})$ , $k_{r e p a i r}$ , κ, and $κ^{'}$ belong to the finite field $F_{q}^{n + m}$ (which has $(n + m) {l o g}_{2} q$ bit-length), and therefore $K = q^{n + m}$ . The number of testing times is $(K)^{s - 1}$ in the case of the TPA and $(K)^{s}$ in the case of $S_{r}^{'}$ . Therefore, the probability for choosing s variables is $1 / q^{(n + m) (s - 1)}$ in the case of the TPA and the probability for choosing $s + 1$ variables is $1 / q^{(n + m) s}$ in the case of $S_{r}^{'}$ . If q is chosen as a large prime (e.g., 160 bits), $k_{1}, \dots, k_{s}$ , and $k_{r e p a i r}$ cannot be solved in a polynomial time. Ergo, the probability of TPA and $S_{r}^{'}$ are negligible.

5.3. Security against Response Forgeries

After controlling $S_{x}$ , suppose that, in the check phase, the adversary $A$ sends a pair of forged coded block and forged tag $(c_{x}^{''}, t_{x}^{''})$ to the TPA, instead of a valid pair of $(c_{x}, t_{x})$ .

Theorem 3.

The advantage of a forgery adversary to pass the check phase is

\begin{matrix} {A d v}_{A} (v e r i f y) = {A d v}_{A} (P R F) + \frac{1}{q^{(n + m) s}} . \end{matrix}

(12)

Proof.

To be able to generate $(c_{x}^{''}, t_{x}^{''})$ which holds the verification $t_{x}^{''} = c_{x}^{''} \cdot κ$ , the adversary $A$ has to obtain κ. Since the TPA is assumed not to collude with any server and κ is sent to $A$ though a secure channel, a possible way for $A$ is to attack the Keygen1 in which the key $k_{p}$ of $C_{p} (p \in {1, \dots, s})$ is computed as (i)

$r_{i} \leftarrow F (k, p, i) \in F_{q}, \forall i \in {1, \dots, n + g}$ ;

(ii)

$k_{p} \leftarrow \sum_{i = 1}^{n + g} ‍ r_{i} b_{i} \in F_{q}^{n + m}$ .

The advantage of $A$ on $r_{i}$ is ${A d v}_{A} (P R F)$ . Since $k_{p} \in F_{q}^{n + m}$ , the advantage of $A$ on $k_{p}$ is $1 / q^{n + m}$ . The advantage of $A$ on $κ = \sum_{i = 1}^{s} ‍ k_{i}$ is $1 / q^{(n + m) s}$ . Therefore, ${A d v}_{A} (v e r i f y) = {A d v}_{A} (P R F) + 1 / q^{(n + m) s}$ . If F is unforgeable and q is chosen large enough, for example, 160 bits, the advantage of $A$ is negligible: ${A d v}_{A} (v e r i f y) < ϵ$ .

5.4. Security against Pollution Attack

Suppose that the server $S_{r}$ is checked as a corrupted server and $S_{x_{1}}, \dots, S_{x_{l}}$ are checked as healthy servers in the check phase. Then, $S_{x_{1}}, \dots, S_{x_{l}}$ are required to repair $S_{r}$ by providing their coded blocks and tags to the new server $S_{r}^{'}$ . In the repair phase, the adversary $A$ attacks $S_{x_{p}} (x_{p} \in {x_{1}, \dots, x_{l}})$ and then provides an invalid packet to the new server $S_{r}^{'}$ (pollution attack). Similar to Theorem 2, the advantage of $A$ to pass the pollution attack check (Step 2 in the repair phase) is

\begin{matrix} {A d v}_{A} (p o l l u t i o n) = {A d v}_{A} (P R F) + \frac{1}{q^{(n + m) (s + 1)}} . \end{matrix}

(13)

The different thing is that the advantage of $A$ on $κ^{'} = k_{r e p a i r} + \sum_{i = 1}^{s} ‍ k_{i}$ is $1 / q^{(n + m) (s + 1)}$ , not $1 / q^{(n + m) s}$ as Theorem 2 because the adversary does not own $κ^{'}$ .

We also consider a stronger adversary $A$ who attacks $S_{r}^{'}$ right after the repair phase to steal $κ^{'}$ from $S_{r}^{'}$ . $A$ then uses $κ^{'}$ to pass pollution attack check in another later repair phases. However, since $k_{r e p a i r}$ is different in each repair time as explained in the Keygen2 (Section 4.1.2), the advantage for $A$ to guess $k_{r e p a i r}$ is ${A d v}_{A} (P R F) + 1 / q^{(n + m)}$ .

6. Efficiency Analysis

Table 1 compares the features and efficiency of the proposed MD-POR scheme with some previous schemes. The RDC-NC [12] and NC-Audit [15] schemes are chosen for the comparison because they have the same scenario as the MD-POR scheme at most. One notable thing is that because the RDC-NC and NC-Audit schemes only consider a single client unlike the MD-POR scheme, we assume that s clients participate in the RDC-NC and NC-Audit schemes so that the comparisons are fair. However, these s clients in the RDC-NC and NC-Audit schemes can only perform in parallel instead of simultaneously combination as the MD-POR scheme. That parameter s in the RDC-NC and NC-Audit schemes does not affect the checking and repairing complexity because only one client can check and repair the servers. That s only affects the storage cost on server-side and the communication cost of the encode phase in the RDC-NC and NC-Audit schemes.

Table 1

Comparison.

		RDC-NC [12]	NC-Audit [15]	MD-POR (our scheme)
Feature	Multiclient	No	No	Yes
	Direct repair	No	Not completed	Yes
	Symmetric key	Yes	Yes	Yes
	public authentication	No	Yes	Yes
Storage complexity	Client-side	$O (5 (n + g) {log}_{2} q)$	$O ((n + g) {log}_{2} q)$	$O ((n + s g) {log}_{2} q)$
	Server-side	$O (s d h ((\| F \| / g) + g))$	$O (s d h ((\| F \| / g) + g))$	$O (d h ((\| F \| / g) + s g))$
	TPA-side	N/A	$O ((n + g + g d h) {log}_{2} q)$	$O ((n + s g) {log}_{2} q)$ .
Encoding complexity	Computation (client)	$O (g d h)$	$O (g d h)$	$O (g)$
	Computation (server)	$O (1)$	$O (1)$	$O (s d h)$
	Computation (TPA)	N/A	$O (1)$	$O (1)$
	Communication	$O (s d h ((\| F \| / g) + g))$	$O (s d h ((\| F \| / g) + g) + s g d h)$	$O (h s ((\| F \| / g) + s g))$
Checking complexity	Computation (client)	$O (h)$	$O (1)$	$O (1)$
	Computation (server)	$O (h d)$	$O (h d)$	$O (h d)$
	Computation (TPA)	N/A	$O (h)$	$O (h)$
	Communication	$O (h ((\| F \| / g) + g))$	$O (h ((\| F \| / g) + g))$	$O (h ((\| F \| / g) + s g))$
Repairing complexity	Computation (client)	$O ((l + 1) d)$	$O (1)$	$O (1)$
	Computation (server)	$O (d l)$	$O (d l)$	$O (d l)$
	Computation (new server)	N/A	$O (d l)$	$O (d l)$
	Computation (TPA)	N/A	$O (l)$	$O (1)$
	Communication	$O ((l + d) ((\| F \| / g) + g))$	$O (l ((\| F \| / g) + g) + l d)$	$O (l ((\| F \| / g) + s g))$

6.1. Storage Cost

6.1.1. Client-Side

In the RDC-NC scheme, because the client keeps five secret keys in $F_{q}^{n + g}$ , the client storage is $O (5 (n + g) {l o g}_{2} q)$ . In the NC-Audit scheme, because the client keeps only one secret key in $F_{q}^{n + g}$ , the client storage is $O ((n + g) {l o g}_{2} q)$ . Meanwhile, the MD-POR scheme has s keys for s clients, each in $F_{q}^{n + s g}$ , and thus the storage cost per client is $O ((n + s g) {l o g}_{2} q)$ .

6.1.2. Server-Side

The size of a file block is $| v | = | F | / g$ . The form of an augmented block is $w_{i} = (v_{i}, \overset{m = s g}{\overset{︷}{\underset{i}{\underset{︸}{0, \dots, 0,1}}, 0, \dots, 0}})$ as indicated in Section 2.3. In the RDC-NC and NC-Audit schemes, since $s = 1$ , the size of an augmented block is $| w | = | F | / g + g$ . In the MD-POR scheme, since $s = g$ , the size of an augmented block is $| w | = | F | / g + s g$ . Furthermore, the size of a coded block is $| c | = | w |$ because each coded block is a linear combination of augmented blocks. The number of servers is h. Each server stores d coded blocks. s clients are assumed to participate in the RDC-NC and NC-Audit schemes in parallel. Therefore, the server storage in the RDC-NC and NC-Audit schemes is $O (s d h (| F | / g + g))$ . The server storage in the MD-POR scheme is $O (d h (| F | / g + s g))$ .

6.1.3. TPA-Side

The RDC-NC scheme does not have a TPA. In the NC-Audit scheme, the TPA not only keeps a key in $F_{q}^{n + g}$ for verification (which is $O ((n + g) {l o g}_{2} q)$ ) but also stores the coding coefficients in $F_{q}$ which are used to compute all coded blocks (which is $O (g d h {l o g}_{2} q)$ ). Hence, the total TPA storage in the NC-Audit scheme is $O ((n + g + g d h) {l o g}_{2} q)$ . In the MD-POR scheme, the TPA is given $κ = k_{1} + \dots + k_{s} \in F_{q}^{n + m}$ (Section 4.1.1). In other words, $κ \in F_{q}^{n + s g}$ . The TPA storage in the MD-POR scheme is thus $O ((n + s g) {l o g}_{2} q)$ .

6.2. Encoding Cost

6.2.1. Computation on Client-Side

In the RDC-NC and NC-Audit schemes, during the encode phase, each client combines g augmented blocks (which is $O (g)$ ) to create $d h$ coded blocks in order to store d coded blocks in each of h servers. The cost in these schemes is thus $O (g d h)$ . In the MD-POR scheme, each client only needs to combine g augmented blocks (which is $O (g)$ ) and distributes the result to all the servers. The servers will create coded blocks by themselves. The cost in the MD-POR is thus $O (g)$ .

6.2.2. Computation on Server-Side

In the RDC-NC and NC-Audit schemes, the servers do not need to do anything and only need to receive the coded blocks computed by the clients. The cost in these schemes is thus $O (1)$ . In the MD-POR scheme, each of h servers combines s coded blocks from the clients and computes d coded blocks for itself. The cost in the MD-POR is thus $O (s d h)$ .

6.2.3. Computation on TPA-Side

In the RDC-NC scheme, the TPA does not exist. In the NC-Audit and MD-POR schemes, the TPA does nothing during the encode phase; and the costs are thus $O (1)$ .

6.2.4. Communication

In the RDC-NC scheme, the client creates $d h$ coded blocks and sends d coded blocks to each of h servers. The size of a coded block in these scheme is ( $| F | / g + g$ ) as mentioned in Section 6.1.2. The number of clients is s. Therefore, the communication cost is $O (s d h (| F | / g + g))$ . In the NC-Audit scheme, the communication is also similar to the RDC-NC scheme. However, the difference is that the client in the NC-Audit scheme not only sends the coded blocks to the servers, but also sends all $s g d h$ coefficients which are used to create the coded blocks to the servers. The cost in the NC-Audit scheme is thus $O (s d h (| F | / g + g) + s g d h)$ . In the MD-POR scheme, each of s clients sends the aggregated coded block to each of h servers. The size of a coded block in the MD-POR scheme is ( $| F | / g + s g$ ) (see (5)). The cost in the MD-POR scheme is thus $O (h s (| F | / g + s g))$ .

6.3. Checking Cost

6.3.1. Computation on Client-Side

In the RDC-NC scheme, the client receives the aggregated coded block from each of h servers and verifies each of them using his/her secret key; the cost is thus $O (h)$ . In the NC-Audit and MD-POR schemes, the TPA will check the servers instead of the client. The cost in the NC-Audit and MD-POR schemes is thus $O (1)$ on the client-side.

6.3.2. Computation on Server-Side

In all three schemes, each of h servers combines its d coded blocks to send the result (an aggregated coded block) back to the verifier. The verifier is the client in the case of the RDC-NC scheme and is the TPA in the case of the NC-Audit and MD-POR schemes. The cost in all three schemes is $O (h d)$ .

6.3.3. Computation on TPA-Side

In the RDC-NC scheme, the TPA does not exist. In the NC-Audit and MD-POR schemes, the TPA verifies the aggregated coded block which is accommodated from each of h servers. Each verification only takes one operation. The cost in the NC-Audit and MD-POR schemes is $O (h)$ .

6.3.4. Communication

In the RDC-NC and NC-Audit schemes, during the check phase, each of h servers sends its aggregated coded block to the client. The size of that coded block is ( $| F | / g + g$ ). The cost in these schemes is thus $O (h (| F | / g + g))$ . In the MD-POR scheme, the mechanism is the same as the RDC-NC and NC-Audit scheme, but the different thing is that the size of a coded block in the MD-POR scheme is ( $| F | / g + s g$ ). The cost in the MD-POR scheme is thus $O (h (| F | / g + s g))$ .

6.4. Repairing Cost

6.4.1. Computation on Client-Side

In the RDC-NC scheme, in the repair phase, the client firstly has to check pollution attack in l coded blocks which are provided from l healthy servers (which is $O (l)$ ). Thereafter, the client computes d new coded blocks for the new server by combining l provided coded blocks (which is $O (l d)$ ). Hence, the computation cost on the client-side in the RDC-NC scheme is $O ((l + 1) d)$ . In the NC-Audit and MD-POR schemes, the client(s) does nothing.

6.4.2. Computation on Server-Side

In the RDC-NC scheme, each of l healthy servers is required to combine its d coded blocks. Therefore, the computation cost on the server-side is $O (d l)$ . The cost in the new server is N/A because the direct repair feature is not supported in the RDC-NC scheme. In the NC-Audit and MD-POR schemes, not only l healthy servers combine their coded blocks (which is $O (d l)$ ) but also the new server computes its d new coded blocks by combining l provided coded blocks (which is $O (d l)$ ).

6.4.3. Computation on TPA-Side

The RDC-NC scheme does not have a TPA. In the NC-Audit scheme, the TPA has to check pollution attack in l provided coded blocks (which is $O (l)$ ). In the MD-POR scheme, the TPA does nothing because the new server will check pollution attack, not the TPA as the NC-Audit scheme. Therefore, the computation cost on the TPA-side in the MD-POR scheme is $O (1)$ .

6.4.4. Communication

In the RDC-NC scheme, each of l healthy servers sends an aggregated coded block whose size is $| F | / g + g$ to the client (which is $O (l (| F | / g + g))$ ). After computing d new coded blocks, the client sends them to the new server (which is $O (d (| F | / g + g))$ ). As a result, the communication cost in the RDC-NC scheme is $O ((l + d) (| F | / g + g))$ . In the NC-Audit scheme, each of l healthy servers also sends an aggregated coded block to the new server (which is $O (l (| F | / g + g))$ ). Then, the new server sends its linear coefficients which are used to compute d new coded blocks from l provided coded blocks to the TPA (which is $O (l d)$ ). Therefore, the communication cost in the NC-Audit scheme is $O (l (| F | / g + g) + l d)$ . In the MD-POR scheme, only each of l healthy servers sends an aggregated coded block to the new server (each coded block has the size $| F | / g + s g$ ). Therefore, the communication cost in the MD-POR scheme is $O (l (| F | / g + s g))$ .

In summary, although the MD-POR scheme supports many heavy features, its cost of the whole scheme is still better than the previous schemes. Let $O_{p} (A)$ , $O_{p} (B)$ , and $O_{p} (C)$ denote the whole computation costs of the RDC-NC, NC-Audit, and MD-POR schemes, respectively. Let $O_{m} (A)$ , $O_{m} (B)$ , and $O_{m} (C)$ denote the whole communication costs of the RDC-NC, NC-Audit, and MD-POR schemes, respectively. Let $O_{s} (A)$ , $O_{s} (B)$ , and $O_{s} (C)$ denote the whole storage costs of the RDC-NC, NC-Audit, and MD-POR schemes, respectively. In reality, d and g are far larger than s and h ( $d, g ≫ s, h$ ), $l \in {1, \dots, h}$ , and $d > g$ . From Table 1, the following results are obtained. $O_{p} (A) - O_{p} (C) = (g d h + d) - (s d h + g) > 0$ because $g ≫ s$ . $O_{p} (B) - O_{p} (C) = (g d h + l) - (s d h + g) > 0$ because $g ≫ s$ . $O_{m} (A) - O_{m} (C) = (d h s + d - h s) (| F | / g) + g (s d h + h + l + d - h s^{2} - h s - l s) > 0$ because $d ≫ s$ and $1 \leq l \leq h$ . $O_{m} (B) - O_{m} (C) = (d h s - h s) (| F | / g) + g (2 s d h + h + l - h s^{2} - h s - l s) + l d > 0$ because $d ≫ s$ and $1 \leq l \leq h$ . $O_{s} (A) - O_{s} (C) = (3 n + 5 g) {l o g}_{2} q + (s - 1) d h (| F | / g) - 2 s g {l o g}_{2} q > 0$ because $| F | / g = n {l o g}_{2} q$ and $d > g$ . $O_{s} (B) - O_{s} (C) = (s - 1) d h (| F | / g) + g {l o g}_{2} q (d h - 2 s + 2) > 0$ because $d ≫ s$ .

7. Performance Evaluation

This section evaluates the computation and communication performances of the proposed MD-POR scheme to show that it is applicable for a real system. A program written by Python 2.7.3 is executed using a computer with Intel Core i5 processor, 2.4 GHz, 4 GB of RAM, and Windows 7 64-bit OS. The length of the prime q is set to be 160 bits. The number of clients is set to be 5 ( $s = 5$ ) which is also the parameter used in the performance evaluation of the InterMac in the paper [16]. The number of servers is set to be 10 ( $h = 10$ ). The number of coded blocks stored in each server is set to be 100 ( $d = 100$ ). The number of healthy servers which are used for repairing is set to be 3 ( $l = 3$ ). The size of each file block is set to be $2^{23}$ bits (1MB). Each result is the average of 100 runs.

The experiment results are observed with three sets of computation performance and a set of communication performance by varying the file size of each client. The computation results are depicted in Figure 4 (encode), Figure 5 (check), and Figure 6 (repair). The communication result is depicted in Figure 7 (encode, check, and repair).

Figure 4

The computation time performance of the encode algorithm.

Figure 5

The computation time performance of the check algorithm.

Figure 6

The computation time performance of the repair algorithm.

Figure 7

The communication time performance.

Computation Performance. The experiment results reveal that the computation time increases almost linearly as the file size increases, and each graph has a different slope. Only the computation time of TPA-side in the check phase is almost constant. In the encode phase, the slopes of increment in the graphs of client-side and server-side are approximately 0.04 and 0.002, respectively. Therefore, if the file size is 1 GB, the computation time on client-side and server-side is estimated as 41 seconds and 2 seconds, respectively. Note that the encode phase only is executed one time in the beginning; meanwhile, the check phase is executed many times during system lifetime and the repair phase is executed once a corruption is detected in the check phase. Consequently, the check and repair phases are more important than the encode phase. In the check phase, the slopes of increment in the graphs of server-side and TPA-side are approximately 0.0005 and 0, respectively. Therefore, if the file size is 1 GB, the computation time on server-side and TPA-side is estimated as 0.52 seconds and 0.02 seconds, respectively. Similarly, in the repair phase, the slopes of increment in the graphs of healthy server-side and new server-side are approximately 0.0005 and 0.0014, respectively. Therefore, if the file size is 1 GB, the computation time on healthy server-side and new server-side is estimated as 0.52 seconds and 1.47 seconds, respectively.

Communication Performance. The MD-POR scheme is performed with the bandwidth of 300 Mbps. The experiment results reveal that the communication time increases almost linearly as the file size increases, and each graph in Figure 7 has a different slope. The slopes of increment in the graphs of the encode phase, the check phase, and the repair phase are approximately 0.048, 0.008, and 0.006, respectively. Therefore, if the file size is 1 GB, the communication time of the encode phase, check phase, and repair phase is estimated as 49.27 seconds, 7.86 seconds, and 5.83 seconds, respectively. In addition, the size of the response from each server is given as follows. The response size of 50 MB, 75 MB, 100 MB, 125 MB, and 150 MB file size is 13 KB, 19 KB, 26 KB, 32 KB, and 38 KB, respectively. Therefore, if the file size is 1 GB, the response size is estimated as 264.87 KB.

The above results indicate that the computation and communication performances are very fast even when the file size is 1 GB.

8. Conclusion and Future Work

In this paper, a network coding-based POR scheme named MD-POR has been proposed. The MD-POR scheme supports multiclient, symmetric key-based direct repair and public authentication features. Moreover, the MD-POR scheme can protect against a strong adversary who can perform mobile attack, curious attack, response forgery, and pollution attack. Furthermore, the efficiency analysis based on the complexity theory shows that although the MD-POR scheme supports many features, its costs are not bad compared with the previous schemes. The experiment results reveal that the computation time increases as the file size increases. However, the graphs show that the slope of increment for the MD-POR scheme increases merely. Future work is invested to implement two previous RDC-NC and NC-Audit schemes in order to compare with the MD-POR scheme. This paper have implemented only the MD-POR scheme to show that its computation cost is applicable for a real system.

Footnotes

Appendix

Notations

Disclosure

The preliminary version of this paper was presented at WISA 2014 [].

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This study was partly supported by Grant-in-Aid for Young Scientists (B) (25730083) and CREST, JST.

References

Juels

Kaliski

B. S.

Jr.

Pors: proofs of retrievability for large files

Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS ′07)

November 2007

584 597

10.1145/1315245.1315317

2-s2.0-74049103479

Shacham

Waters

Compact proofs of retrievability

Proceedings of the 4th International Conference on the Theory and Application of Cryptology and Information Security: Advances in Cryptology (ASIACRYPT ′08)

December 2008

Melbourne, Australia

90 107

Bowers

K. D.

Juels

Oprea

Proofs of retrievability: theory and implementation

Proceedings of the Workshop on Cloud Computing Security (CCSW ′09)

2009

43 54

Curtmola

Khan

Burns

Ateniese

MR-PDP: multiple-replica provable data possession

Proceedings of the 28th International Conference on Distributed Computing Systems (ICDCS ′08)

July 2008

411 420

10.1109/icdcs.2008.68

2-s2.0-51849117195

Aguilera

M. K.

Janakiraman

Efficient fault-tolerant distributed storage using erasure codes

2004

St. Louis, Mo, USA

Washington University

Ahlswede

Cai

S.-Y. R.

Yeung

R. W.

Network information flow

IEEE Transactions on Information Theory 2000 46 4 1204 1216

10.1109/18.850663

Medard

Koetter

Karger

D. R.

Effros

Shi

Leong

A random linear network coding approach to multicast

IEEE Transactions on Information Theory 2006 52 10 4413 4430

10.1109/tit.2006.881746

MR2300827

2-s2.0-33947399169

S.-Y. R.

Yeung

R. W.

Cai

Linear network coding

IEEE Transactions on Information Theory 2003 49 2 371 381

10.1109/tit.2002.807285

MR1966785

2-s2.0-0037323073

Agrawa

Boneh

Homomorphic MACs: MAC-based integrity for network coding

Proceedings of the 7th Conference on Applied Cryptography and Network Security (ACNS ′09)

2009

292 305

10.1007/978-3-642-01957-9_18

10.

Dimakis

A. G.

Godfrey

P. B.

Wainwright

M. J.

Ramchandran

Network coding for distributed storage systems

IEEE Transactions on Information Theory 2010 56 9 4539 4551

10.1109/TIT.2010.2054295

2-s2.0-77955726417

11.

Yang

Wang

Xue

Tree-structured data regeneration in distributed storage systems with network coding

Proceedings of the 29th Conference on Information Communications (INFOCOM ′10)

2010

2892 2900

12.

Chen

Curtmola

Ateniese

Burns

Remote data checking for network coding-based distributed storage systems

Proceedings of the ACM Workshop on Cloud Computing Security Workshop (CCSW ′10)

2010

31 42

10.1145/1866835.1866842

2-s2.0-78650081896

13.

Cao

Yang

Lou

Hou

Y. T.

LT codes-based secure and reliable cloud storage service

Proceedings of the 31st IEEE Conference on Computer Communications (INFOCOM ′12)

March 2012

693 701

10.1109/infcom.2012.6195814

2-s2.0-84861630520

14.

Chen

H. C. H.

Lee

P. P. C.

Tang

NCCloud: applying network coding for the storage repair in a cloud-of-clouds

Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST ′12)

2012

15.

Markopoulou

NC-audit: auditing for network coding storage

Proceedings of the International Symposium on Network Coding (NetCod ′12)

2012

155 160

16.

Markopoulou

On detecting pollution attacks in inter-session network coding

Proceedings of the IEEE Conference on Computer Communications (INFOCOM ′12)

March 2012

Orlando, Fla, USA

343 351

10.1109/infcom.2012.6195771

2-s2.0-84861595495

17.

Wang

Ren

Lou

Enabling public auditability and data dynamics for storage security in cloud computing

IEEE Transactions on Parallel and Distributed Systems 2011 22 5 847 859

10.1109/tpds.2010.183

2-s2.0-79953294892

18.

Zhu

Wang

Ahn

G. J.

Yau

S. S.

Dynamic audit services for integrity verification of outsourced storages in clouds

Proceedings of the 26th Annual ACM Symposium on Applied Computing (SAC ′11)

March 2011

1550 1557

10.1145/1982185.1982514

2-s2.0-79959325063

19.

Deng

Huang

Identity privacy-preserving public auditing with dynamic group for secure mobile cloud storage

Proceedings of the 8th International Conference on Network and System Security (NSS ′14)

October 2014

Xi'an, China

28 40

20.

Koetter

Medard

An algebraic approach to network coding

IEEE/ACM Transactions on Networking 2003 11 5 782 795

10.1109/tnet.2003.818197

2-s2.0-0242334165

21.

Agrawal

Boneh

Boyen

Freeman

D. M.

Preventing pollution attacks in multi-source network coding

Proceedings of the 13th International Conference on Practice and Theory in Public Key Cryptography (PKC ′10)

2010

161 176

10.1007/978-3-642-13013-7_10

22.

Czap

Vajda

Signatures for multi-source network coding

IACR Cryptology ePrint Archive 2010 2010/328

23.

Yan

Yang

Fang

Short signature scheme for multi-source network coding

Computer Communications 2012 35 3 344 351

10.1016/j.comcom.2011.10.012

2-s2.0-84856515165

24.

Omote

Thao

T. P.

MDNC: multi-source and direct repair in network coding-based proof of retrievability scheme

Proceedings of the 15th International Workshop on Information Security Applications (WISA ′14)

2014

177 188