Sage Journals: Discover world-class research

Abstract

More and more organizations outsource their data to remote cloud servers (RCSs). Data owners rent the cloud service providers (CSPs) infrastructure to store their unlimited resources by paying fees metered in month or gigabyte. For increasing the availability and scalability of data, the data owners store their data replicas on multiple servers across multiple data centers. Data owners should ensure that the CSPs actually store all their copies according to the service contract. In this paper, we present an efficient dynamic multicopies possession checking scheme that has the following properties: (1) the data owner uses fully homomorphic encryption (FHE) algorithm to generate multiple copies; (2) the scheme supports data block dynamic operation; (3) the scheme supports third-party auditor's public validation. Finally, security analysis and experimental results show that our scheme can resist forgery, replacement, and replay attack and perform better than some other related scheme published recently.

1. Introduction

Storing abundant file resources in the cloud has become the current trend [1]. An increasing number of users send their data to the cloud service providers (CSPs) for storage. Storing data on the remote cloud server enables organizations to relieve the burden brought by server updating and other computing issues. Also, authorized users can conveniently use the stored data from different geographic locations. When data has been stored on a remote CS that may be unreliable, the data owners cannot control their privacy data directly. This lack of control for privacy data would exhibit data confidentiality and privacy protection problem. For the CSP, some errors may occur, such as software or hardware failures and operational errors of system administrator [2–5]. For maintaining the CSP's reputation, he/she usually tries to hide data loss incidents. Therefore, he/she may not be trustworthy. In order to deal with data confidentiality, the sensitive data is encrypted before it is outsourced to the CSP. For integrity protection, many researchers have proposed provable data possession (PDP) schemes to verify the integrity of data stored on the CSP.

For intensifying responsibility of the CSP, it is essential for the data owner to demand that the CSP provide evidence that the data owners' data are not discarded or damaged [6]. PDP [7] is a method to check integrity of the data stored on the CSP. To keep the security of data, the data owner encrypts the data file and calculates tag for encrypted data file. Then he/she stores the encrypted data blocks and the corresponding tags on the CSP and deletes them from the local computers. When the data owner (or verifier) wants to check the integrity of data file, he/she sends a challenge vector to the CSP. The CSP makes a response for challenge and sends the response to the data owner (or verifier).

How to efficiently guarantee that the CSP faithfully stores the data owner's data file is the main goal of remote PDP scheme. Nowadays, there exist two types of verification scheme: provable data possession (PDP) and Proof of Retrieval (POR). In addition, to validate the correctness of data, the POR scheme can also check out the corrupted data block and then recover the original data. In order to check the correctness of data stored on the CSP, Ateniese et al. [7, 8] proposed two provably secure PDP schemes. In the subsequent year, different variations of PDP schemes were proposed, such as [9–12] which are based on different cryptographic hard problem assumptions. In 2007, a model of POR was constructed in [13], Juels and Kaliski used error-correcting code to present a sentinel-based POR protocol. Various POR schemes that check static data file integrity can be found in [14–19].

In fact, it is not realistic to check static data file integrity, because the stored data block is usually modified by the data owner. Therefore, we consider that one of the basic requirements of storing data on the remote CSP is to support dynamic update operation of data block. The updated operation mainly refers to modification, insertion, or deletion of the stored data.

Another design goal is to support public validation. It is not appropriate to allow any side of the data owner or the CSP to verify the integrity of data stored on the CSP, because neither of them could be ensured to get unbiased verification results [20]. The third-party verifier is a suitable choice for the storage verification. A third-party verifier has the ability to do meaningful work and it is trusted by the data owners and CSPs. Under the influence of the distributed storage system, the data storage verification becomes more significant. Many schemes have been proposed: Proof of Retrieval (POR) protocols [21], remote integrity checking protocols [22, 23], and provable data possession protocols [24]. However, these protocols only apply to the scene that the data owner verifies the integrity of the data file stored on the CSP. In cloud computing, the data file storage verification is provided by a third-party rather than the data owners. Therefore, we propose a PDP scheme that supports public verifiability for data stored on the CSP and dynamic operation.

(1) Related Work. Ateniese et al. [7] first proposed a PDP paradigm and built two PDP schemes. However, they have not considered the multicopies storage system and dynamic data storage operation. To solve dynamic data operation problem, Zeng [10] and Wang et al. [20] proposed a dynamic version which can support limited data block operation but cannot support block insertions. In 2009, Erway et al. [25] modified the PDP scheme in [7] to achieve the dynamic updates of data files stored on the CSP applying rank-based authenticated skip lists. However, the computation and communication efficiency of this scheme achieves good only for a single data block. Wang et al. [20] verified the integrity of file copies using MHT. Although their scheme supports dynamic operation, their data files stored on the CSP are not encrypted and work for a single data block. In 2011, Hao et al. [26] proposed a PDP scheme which supports not only the dynamic operation for data file but also public audit. Public audit is an interaction protocol which allows any entity, not necessarily the data owner, to verify the correctness of data stored on the CSP, since their scheme also does not consider stored data file encryption and the scheme cannot be modified to support multicopies data file stored. Barsoum and Hasan [27] proposed a PDP scheme that encrypts a file F by using an encryption algorithm and generates different data file replicas. The encryption algorithm has a strong diffusion property, for example, AES. However, the data updating efficiency of their scheme is not good, for all the copies stored on the CSP will be encrypted again and updated on the CSP. To solve this problem, we use the FHE algorithm to encrypt the data file and generate multicopies stored on the CSP. The file copies need not be encrypted again when data updates. Therefore, the verification efficiency of our PDP scheme has been greatly improved.

The goal of our PDP scheme differs from the goal defined in the earlier researches. Our PDP scheme aims to keep confidentiality, integrity, and public verifiability of data copies stored on the CSP. Considering the FHE algorithm which will generate a random noise in each process, we encrypt a file F to get multiple file copies. The novelty of our scheme comes with ensuring data security and providing good efficiency for data updating while preserving the public verifiability. Because the file copies are stored on the distributed cloud server, we assume that all copies will not be damaged at the same time.

(2) Design Goal. The proposed scheme in our paper should hold the following properties simultaneously. (1) Correctness: the third-party auditor must accept all valid information proved by the cloud; (2) Public Auditing: the public auditor cannot get any information about the stored data block; (3) security goals: the proposed scheme can resist forgery attack, replacement attack, replay attack, and channel attack. The corrupted data block in corrupted copies can be checked out in this scheme. Thus, the data owner just needs to regenerate the new copy of corrupted data block instead of regenerating the whole new file copy. When the data owner needs to regenerate the valid form of corrupted data block copy, he/she first obtains the original data block from other uncorrupted data copies and then reencrypts the original data block to generate a new data block copy.

The rest of the paper is organized as follows. The preliminaries are introduced in Section 2, followed by the proposed scheme description in Section 3. Section 4 analyzes the security of our scheme. Section 5 presents the implementation and experimental result. Conclusions are given in Section 6.

2. Preliminary

Some preliminary knowledge used in our paper is introduced in this section. Let F be a file outsourced to the CSP. It is divided into a sequence of n blocks; that is, $F = \{b_{1}, \dots, b_{n}\}$ , where $b_{j} \in Z_{q}^{*}$ for some large prime q. Denote ${\tilde{F}}_{i}$ as the ith file copy. Thus ${\tilde{F}}_{i} = \{{\tilde{b}}_{i 1}, \dots, {\tilde{b}}_{i n}\}$ , where ${\tilde{b}}_{i j}$ represents the jth file block of the ith copy.

Bilinear Map/Pairing. Denote $G_{1}$ , $G_{2}$ , and $G_{T}$ as three cycle multiplicative groups; they have the same prime order q; that is, $|G_{1}| = |G_{2}| = |G_{T}| = q$ . Let $e : G_{1} \times G_{2} \to G_{T}$ be a bilinear map [2], which holds the following properties: (a)

bilinearity: $\forall g_{1} \in G_{1}, g_{2} \in G_{2}$ , and $a, b \in Z_{q}$ :

\begin{matrix} e (g_{1}^{a}, g_{2}^{b}) = e {(g_{1}, g_{2})}^{a b}; \end{matrix}

(1)

(b)

nondegeneracy: $\exists g_{3} \in G_{1}, g_{4} \in G_{2}$ , such that $e (g_{3}, g_{4}) \neq 1_{G_{T}}$ ;

(c)

computability: $\forall g_{5} \in G_{1}$ , $g_{6} \in G_{2}$ ; there exists an efficient algorithm to compute $e (g_{5}, g_{6})$ .

Computational Diffie-Hellman (CDH) Problem. $G_{1}$ is a cyclic multiplicative group on ECC generated by $g^{'}$ . Given $g^{' a}$ and $g^{' b}$ with $a, b \in Z_{q}$ , compute $g^{' a b}$ .

Our protocol involves three cryptographic functions: $h (\cdot)$ is a map-to-point hash function, $ψ (\cdot)$ is a pseudorandom function, and $ϑ (\cdot)$ is a pseudorandom permutation:

$h (\cdot) : {\{0,1\}}^{*} \to G_{1}$ ,

$ψ (\cdot) : Z_{q}^{*} \times \{1,2, \dots, n\} \to Z_{q}^{*}$ ,

$ϑ (\cdot) : Z_{q}^{*} \times \{1,2, \dots, n\} \to \{1,2, \dots, n\}$ .

Fully Homomorphic Encryption (FHE) [28]. The FHE scheme includes the following three algorithms:

KeyGen: input a security parameter λ and output three parameters $(ρ, η, γ)$ , where $ρ = λ, η = λ \cdot ⌈l o g n⌉, γ = 5 \cdot λ \cdot ⌈l o g n⌉ / 2$ . Let p be a η-bit odd integer, $p \leftarrow (2 Z + 1) \cap (2^{η - 1}, 2^{η})$ . Sign p as the secret key sk. Compute $q_{0} \leftarrow (2 Z + 1) \cap (1, 2^{γ} / p)$ , $t_{0} = q_{0} \cdot p$ . Sign $t_{0}$ as the public key pk.

Enc: input a plaintext $m \in {0,1}$ and the public key $t_{0}$ ; choose a random integer $q^{'}$ from $[1, 2^{γ} / p)$ and r from $(- 2^{ρ}, 2^{ρ})$ and then compute ciphertext C:

\begin{matrix} C = E (m, p k) = (m + 2 \cdot r + q^{'} \cdot p) m o d t_{0} . \end{matrix}

(2)

Dec: input a ciphertext C and the secret key sk and output the plaintext m:

\begin{matrix} m = D (C, s k) = (C \mod p) \mod 2 . \end{matrix}

(3)

On one hand, each homomorphic encryption generates a random noise, which is a variable. Therefore, encrypting one original file τ times with one public key will generate τ different file replicas. On the other hand, we use the homomorphic property that satisfies plaintext operation corresponding to ciphertext operation to realize dynamic operation for data replicas stored on the CSPs.

3. Dynamic Multireplicas Provable Remote Data Possession (DPRDP) Scheme

In this work, the cloud storage model considered in this paper consists of four entities as shown in Figure 1: (1) a data owner who can be an individual or an organization possesses privacy data information, (2) the CSPs provide paid storage space for storing the data owner's data files, (3) the third-party auditor is responsible for integrity verification of file copies stored on the CSP through challenge-response protocol, and (4) authorized users share the decryption key with the data owner and have the right to access the data copies from the CSP. In our paper, we mainly discuss the first three entities: the data owner, the CSP, and the third-party verifier.

Figure 1

Cloud storage system model.

3.1. DPRDP Scheme

In [29], Mukundan et al. used Paillier encryption to generate data copies. However, the encryption time complexity of their scheme is $2 \cdot O ((l o g N)^{3})$ ; it is a particular large number when the size of N is 1024 bits. Therefore, we choose a fully homomorphic encryption [28] with smaller time complexity than the scheme in [29] to generate the data block copies.

The proposed MR-PDP scheme includes nine functions KeyGen, CopiesGen, TagGen, Challenge, Response, Verify, Request, Execution, and IndiceRetri.

(1) KeyGen. The data owner inputs a security parameter λ and generates a public/secret key pair $(p k, s k)$ for encryption or decryption. Also, he/she chooses a random integer $δ \in Z_{q}$ as secret key for signature and generates a public key $Y = g^{δ} \in G_{2}$ for verification; g is the generator of $G_{2}$ .

(2)CopiesGen. For a file F, the data owner creates τ differentiable copies $F = \{{\tilde{F}}_{1}, \dots, {\tilde{F}}_{τ}\}$ using FHE algorithm. For copy ${\tilde{F}}_{i}, 1 \leq i \leq τ$ , it can be denoted as ${\tilde{F}}_{i} = {\{{\tilde{b}}_{i 1}, \dots, {\tilde{b}}_{i n}\}}_{1 \leq i \leq τ}$ , where ${\tilde{b}}_{i j}$ is the encryption of data block $b_{j}$ . Firstly, homomorphic encryption generates a random noise that is a variable. Therefore, encrypting one original data s times with a public key will generate s different data replicas. That is, even if $b_{j} = b_{j^{'}}$ , we still get ${\tilde{b}}_{i j} \neq {\tilde{b}}_{{i j}^{'}}$ , $1 \leq j, j^{'} \leq n$ , $j \neq j^{'}$ . Thus, the data owner can obtain different tags for the same data blocks. Secondly, we use the homomorphic property that satisfies plaintext operation corresponding to ciphertext operation to realize dynamic operation for data replicas stored on the CSPs, including data block insertion, modification, and deletion. Finally, the authorized users only need to hold a single decryption secret key sk. Upon obtaining the file copies from the CSP, the authorized users decrypt the file copies and get the original file plaintexts.

(3)TagGen. It is necessary to suppose that the data owner generates the tags sequentially in accordance with the index j. That is, the data owner generates a tag for a data block ${\tilde{b}}_{i 2}$ after ${\tilde{b}}_{i 1}$ . For data block ${\tilde{b}}_{i j}$ , $1 \leq i \leq τ$ , $1 \leq j \leq n$ , the data owner does the following: (a)

choosing a random element $ω \in_{R} G_{1}$ ;

(b)

computing $T_{i j} = {(h (F_{N} ‖i) \cdot ω^{{\tilde{b}}_{i j}})}^{δ} \in G_{1}$ , where $F_{N}$ is the name of file F;

(c)

outputting $T_{i j}$ ;

(d)

computing the tag $T a g$ of $({\tilde{b}}_{i j}, T_{i j})$ using the secret key δ and sending $T a g$ and $({\tilde{b}}_{i j}, T_{i j})$ to the CSP.

Upon receiving the

T a g

and

({\tilde{b}}_{i j}, T_{i j})

, the CSP verifies whether the block-tag pair

({\tilde{b}}_{i j}, T_{i j})

is correct or not. If the block-tag pair

({\tilde{b}}_{i j}, T_{i j})

is correct, the CSP stores them. Otherwise, he/she rejects them.

(4)Challenge. The data owner assigns the third auditor to complete validation tasks. For challenging the CSP and verifying the possession and integrity of all copies, the third-party auditor sends c (number of data blocks to be challenged) and two challenge keys $k_{1}, k_{2}$ at each challenge stage: a pseudorandom permutation key $k_{1} \in Z_{q}^{*}$ and a pseudorandom function key $k_{2} \in Z_{q}^{*}$ . Both the CSP and the third-party auditor use ϑ keyed with $k_{1}$ and ψ keyed with $k_{2}$ to generate a set $S = \{(j^{'}, r_{j^{'}})\}$ of c pairs of random indices and their random values, where $r_{j^{'}} = ψ_{k_{2}} (t)_{1 \leq t \leq c}$ , $j^{'} = ϑ_{k_{1}} (t)_{1 \leq t \leq c}$ . The random index $j^{'}$ indicates the physical positions of c data blocks to be challenged. Let the challenge be a triple $(c, k_{1}, k_{2})$ .

(5)Response. The CSP receives the triple $(c, k_{1}, k_{2})$ and computes random indices $j^{'}$ and random values $r_{j^{'}}$ . After obtaining the values of $j^{'}$ and $r_{j^{'}}$ , the CSP does the following: (a)

computing $T = \prod_{(j^{'}, r_{j^{'}}) \in S} T_{j^{'}}^{r_{j^{'}}}$ , where $T_{j^{'}} = \prod_{i = 1}^{τ} T_{i j^{'}} \in G_{1}$ ;

(b)

computing $ξ_{i} = \sum_{(j^{'}, r_{j^{'}}) \in S} r_{j^{'}} \cdot {\tilde{b}}_{i j^{'}} \in Z_{p}$ , $ξ = {\{ξ_{i}\}}_{1 \leq i \leq τ}$ ;

(c)

sending $(T, ξ)$ to the verifier.

(6)Verify. After receiving the proof $(T, ξ)$ from the CSP, the third-party auditor checks the following verification equation:

\begin{matrix} e (T, g) \overset{?}{=} e (\prod_{(j^{'}, r_{j^{'}}) \in S} \prod_{i = 1}^{τ} h {(F_{N} ‖i)}^{r_{j^{'}}} \cdot ω^{\sum_{i = 1}^{τ} ξ_{i}}, Y) . \end{matrix}

(4)

If the above equation holds, the third-party auditor outputs 1, otherwise 0.

The challenge-response protocol between the third-party auditor and the CSP in the DPRDP scheme is summarized in Figure 2.

Figure 2

The challenge-response protocol in the DPRDP scheme.

(7)Request. When the data owner wants to update the data block stored on the CSP, he/she would generate an update requirement including file name, update command, update index, and encrypted difference and update tag. The data owner sends the update request $(F_{N}, R e q u s e t, j, F H E (Δ b_{j}), T_{i j}^{'})$ to the CSP, where Request denotes update request that may be modification, deletion, or insertion, j is the index of the updated data block, $F H E (Δ b_{j})$ is the encryption of difference between the original data block and the updated data block, and $T_{i j}^{'}$ is the updated tag. The data block modification includes data block addition and multiplication.

(8)Execution. The CSP receives the update request and executes the update operation. He/she inputs the copies of file F, the tag T, and update request and outputs the copies of an updated file $F^{'}$ and its new tag $T^{'}$ . Upon completing the block data update operation, the third-party auditor executes the verification protocol to confirm that the CSP has implemented the update operation correctly. We give a detailed data block modification procedure in Figure 3.

Figure 3

Data block modification operation procedure in the DPRDP scheme.

The data block insertion and deletion operations are similar with the data block modification. In order to complete the data block insertion, the data owner inserts a new data block before position j in a copy file. Thus, the number of data blocks in copies will be changed from n to $n + 1$ . The data owner chooses an inserted data block $b_{j}^{'}$ and encrypts it using FHE algorithm. Also, the data owner calculates a new tag $T_{i j}^{'}$ for the inserted data block ciphertext ${\tilde{b}}_{i j}^{'}$ of $b_{j}^{'}$ and then sends $(F_{N}, i n s e r t, j_{b e f o r e}, E (b_{j}^{'}), T_{i j}^{'})$ and two keys $k_{1}$ , $k_{2}$ to the CSP. Upon getting these elements, the CSP inserts the new data block ciphertext ${\tilde{b}}_{i j}^{'}$ before position j in copy file ${\tilde{F}}_{i}$ , $1 \leq i \leq τ$ . Hence, the inserted file copy can be represented as ${\tilde{F}}_{i}^{'} = \{{\tilde{b}}_{i 1}, \dots, {\tilde{b}}_{i j - 1}, {\tilde{b}}_{i j}^{'}, {\tilde{b}}_{i j}, \dots, {\tilde{b}}_{i n}\}$ and the new data block-tag set can be denoted as $\{\prod_{i = 1}^{τ} T_{i 1}, \dots, \prod_{i = 1}^{τ} T_{i j - 1}, \prod_{i = 1}^{τ} T_{i j}^{'}, \prod_{i = 1}^{τ} T_{i j}, \dots, \prod_{i = 1}^{τ} T_{i n}\}$ . Finally, the CSP makes a response with the auditor to prove that he/she correctly performs the insertion operation. Data block deletion is opposite to the insertion operation. Upon deleting a data block, the number of file data blocks will be changed from n to $n - 1$ . If the data owner wants to delete the jth block from each copy, he/her sends a delete command $(F_{N}, D e l e t e, j, Θ, Θ)$ to the CSP, where Θ denotes empty element. After receiving this command, the CSP deletes the data block at the jth position of all file copies and its tag.

(9)IndiceRetri. The indices of corrupted data blocks can be identified by using this retrieval algorithm. The response $(T, ξ)$ generated by the CSP will be correct and will pass the third-party auditor's verification only if the data blocks in all copies are consistent and intact. Hence, there exist one or more corrupted data blocks; the whole verification procedure fails. When file copy is corrupted, the data owner should recover the file copy timely for ensuring that the authorized users in different locations can normally use the file copy. If the corrupted copy only includes one corrupted data block, updating the whole copy file may lead to huge computational and communication cost. Therefore, it is necessary to identify the indices of corrupted data block in corrupted file copy. We show a detailed retrieve algorithm in Algorithm 1.

Algorithm 1: Retrieve corrupted block index.

Input $(T, ξ)$ and $k_{1}$ , $k_{2}$ , verify $e (T, g) \overset{?}{=} e (\prod_{(j^{'}, r_{j^{'}}) \in S} {\prod_{i = 1}^{τ} h (F_{N} ‖i ‖j^{'})}^{r_{j^{'}}} \cdot u^{\sum_{i = 1}^{τ} ξ_{i}}, Y)$

If the validation fails

Then require the CSP to send ( ${\tilde{b}}_{i j}, T_{i j}$ ) to the verifier

the verifier computes the following τ verification equation:

verify $(1) e (T_{1}, g) \overset{?}{=} e (\prod_{(j^{'}, r_{j^{'}}) \in S} h {(F_{N} ‖i)}^{r_{j^{'}}} \cdot ω^{ξ_{1}}, Y)$

$\dots$

and $(τ) e (T_{τ}, g) \overset{?}{=} e (\prod_{(j^{'}, r_{j^{'}}) \in S} h (F_{N} ‖i)^{r_{j^{'}}} \cdot ω^{ξ_{τ}}, Y)$

where $T_{t} = \prod_{(j^{'}, r_{j^{'}}) \in S} T_{i j^{'}}^{r_{j^{'}}}$ ,

If the validation $(t)_{1 \leq t \leq τ}$ fails

Then compute $n_{c_{t}} = ⌈\sqrt{c_{t}}⌉$ and $n_{c_{t}}^{'} = ⌊\sqrt{c_{t}}⌋$ , divide S into $n_{c_{t}}^{'}$ parts, $S_{1_{t}}, S_{2_{t}}, \dots, S_{n_{c_{t}}^{'}}$

where $S_{ι_{t}}, (1 \leq ι \leq n_{c}^{'})$ include $n_{c_{t}}$ elements, $S_{n_{c_{t}^{'}}}$ includes $c - n_{c_{t}} \cdot (n_{c_{t}}^{'} - 1)$ ,

$c_{t}$ denotes the number of data block which can not pass verification

verify $(1_{t}) e (T_{1_{t}}, g) \overset{?}{=} e (\prod_{(j^{'}, r_{j^{'}}) \in S_{1_{t}}} h (F_{N} ‖i)^{r_{j^{'}}} \cdot ω^{ξ_{t}^{1_{t}}}, Y)$

and $(n_{c_{t}}^{'}) e (T_{n_{c_{t}}^{'}}, g) \overset{?}{=} e (\prod_{(j^{'}, r_{j^{'}}) \in S_{n_{c_{t}}^{'}}} h (F_{N} ‖i)^{r_{j^{'}}} \cdot ω^{ξ_{t}^{n_{c_{t}}^{'}}}, Y)$

Until $n_{c_{t}} = 2$ or check all corrupted data block

End

The retrieve algorithm inputs four elements: T, ξ, $k_{1}$ , and $k_{2}$ . If CSP's response does not pass the verification, the third-party auditor uses Algorithm 1 to identify the index of corrupted data block. The third-party auditor first identified which cloud server contains the corrupted data copies. Then he/she checks out the damaged data copies from the cloud server which contains the damaged data copies. The verifier calculates $n_{c} = ⌈\sqrt{c}⌉$ and $n_{c}^{'} = ⌊\sqrt{c}⌋$ and divides S into $n_{c}^{'}$ parts, $S_{1}, S_{2}, \dots, S_{n_{c}^{'}}$ , where $S_{t} (1 \leq t \leq n_{c}^{'} - 1)$ includes $n_{c}$ elements and $S_{n_{c}^{'}}$ includes $c - n_{c} (n_{c}^{'} - 1)$ elements. Then, the third-party auditor, respectively, verifies the correctness of data block whose index locates in $S_{t}$ . If the response can pass the third-party auditor's verification, the data blocks whose indices belong to $S_{t}$ will be kept correctly. Otherwise, the verifier will continue to check the integrity of smaller range data block (the number of data blocks $⌊\sqrt{c_{t}}⌋$ ) until remaining two data blocks or checking all corrupted data blocks. Finally, we can get the indices of corrupted data block in the cloud server. Thus, when the data owner recovers the corrupted copies, he/she only needs to regenerate a correct data block to replace the corrupted data block rather than regenerating a new file copy.

We give an example for Algorithm 1. As shown in Figure 4, Algorithm 1 first gets the corrupted data block copies ${\tilde{F}}_{2}$ and ${\tilde{F}}_{τ}$ which are marked with blue frame and then obtains the corrupted copies information ${\tilde{b}}_{i 1}$ and ${\tilde{b}}_{i 3}$ . That is, the actually corrupted data blocks are ${\tilde{b}}_{21}$ and ${\tilde{b}}_{τ 3}$ which are marked with red frame. Therefore, the data owner only needs to regenerate the data block copies ${\tilde{b}}_{21}$ and ${\tilde{b}}_{τ 3}$ and restores them in the copies ${\tilde{F}}_{2}$ and ${\tilde{F}}_{τ}$ , respectively.

Figure 4

Example of Algorithm 1.

4. Security Analysis

Theorem 1.

If both the data owner and the CSP are honest to perform the DPRDP scheme, the response of CSP can pass the auditor's validation:

\begin{matrix} e (T, g) = e (\prod_{(j^{'}, r_{j^{'}}) \in S} \prod_{i = 1}^{τ} T_{i j^{'}}^{r_{j^{'}}}, g) = e (\prod_{(j^{'}, r_{j^{'}}) \in S} \prod_{i = 1}^{τ} {(h (F_{N} ‖i) \cdot ω^{{\tilde{b}}_{i j}})}^{r_{j^{'}}}, Y) = e (\prod_{(j^{'}, r_{j^{'}}) \in S} \prod_{i = 1}^{τ} h {(F_{N} ‖i)}^{r_{j^{'}}} \cdot ω^{\sum_{i = 1}^{τ} ξ_{i}}, Y) . \end{matrix}

(5)

Theorem 1 is proved. The details of security analysis are given in the following.

(1) Scenario. In our protocol, there exist four entities in the security model: the data owner, verifier, CSP, and authorized user. The security analysis of protocol involves the first three participants, so we only consider the communication among the data owner, verifier, and CSP. The data owner encrypts data block and gets the replicas of each data block. Then the data owner generates the tags corresponding to the data block replicas and stores the block-tag pair on the CSP. When the verifier wants to validate the integrity of data copies, it sends to the CSP a challenge vector. Upon receiving the challenge vector, the CSP aggregates the tag of challenged data block and the challenged data blocks, respectively, and then sends the aggregation back to the verifier. That is, the verifier sends a challenge vector to the CSP, and then the CSP makes a response for the challenge. Finally, the verifier checks the aggregation information. The interaction among the data owner, CSP, and verifier is shown in Figure 5.

Figure 5

The interaction among the data owner, CSP, and verifier.

(2)Possible Attack

(a) Internal Attack. The main internal attack considered in our scheme refers to forgery attack, replacement attack, and replay attack. All these attacks initiated by dishonest CSP. The dishonest CSP may forge a single data block tag or replace a corrupted data block-tag pair with another valid block-tag pair. Even, he/she may also try to make a response to the current challenge of verifier with the previous proof, without querying the current data block actually.

(b) External Attack. An adversary may temper the messages transmitting on the channel.

(3)Interaction Protocol Design. In order to clearly present interaction between any two participants, a detailed communication process is given in Figure 6.

Figure 6

The communication process among the data owner, CSP, and verifier.

(4)Security Analysis

(a) External Attack. An adversary may temper the transmitted messages on channel shown in Figure 4. On the first transmission channel, the data owner sends $(T_{i j}, {\tilde{b}}_{i j})$ and tag $T a g (T_{i j}, {\tilde{b}}_{i j})$ to the CSP. Upon receiving these elements, the CSP verifies whether the received block-tag pair $(T_{i j}, {\tilde{b}}_{i j})$ is correct or not according to $T a g (T_{i j}, {\tilde{b}}_{i j})$ and the public key Y. If the adversary tempers $T_{i j}$ or ${\tilde{b}}_{i j}$ , the CSP would fail to validate and refuse to store the received $(T_{i j}, {\tilde{b}}_{i j})$ . Therefore, the adversary cannot attack successfully in the first channel.

For the adversary, tempering with the transmitted messages on the channels ② and ③ can be regarded as the CSP tempering with these messages. In addition to tempering messages, the CSP can also initiate forgery, replacement, and replay attack. That is, the CSP has stronger attack ability than an external adversary. Therefore, for the messages transmitted on the second and the third channel, we just need to consider the internal attack initiated by the CSP.

(b) Internal Attack

Forgery. A dishonest CSP attempts to forge the tag for some data block and wishes that the value of forged tag is the same as the original tag. For a single tag $T_{i j} = {(h (F_{N} ‖i) \cdot ω^{{\tilde{b}}_{i j}})}^{δ}$ , the CSP may change the input information of hash function $h (\cdot)$ and the value of ${\tilde{b}}_{i j}$ . Because the hash function $h (\cdot)$ resists collision, the hash result would change a lot if the CSP changes the input value of $F_{N}$ , i, or j. For the random element $ω \in G_{1}$ , the exponentiation result would change significantly if the CSP changes ${\tilde{b}}_{i j}$ . Therefore, it is hard for the CSP to forge a single tag.

Replacement. Assuming that the CSP uses a valid block-tag pair $(T_{i w}, {\tilde{b}}_{i w})_{w \notin S}$ to replace some block-tag pair $(T_{i l}, {\tilde{b}}_{i l})_{l \in S}$ . We get a conclusion that if the CSP replaces $(T_{i l}, {\tilde{b}}_{i l})_{l \in S}$ using another valid block-tag pair $(T_{i w}, {\tilde{b}}_{i w})$ , the proof $(T^{'}, ξ^{'})$ passes the third-party auditor's verification with negligible probability.

According the verification equation, the following equations are obtained:

\begin{matrix} e (T^{'}, g) = e (\prod_{(j^{'}, r_{j^{'}}) \in S, j^{'} \neq l} (\prod_{i = 1}^{τ} h {(F_{N} ‖i)}^{r_{j^{'}}} \cdot ω^{\sum_{i = 1}^{τ} \sum_{(j^{'}, r_{j^{'}}) \in S, j^{'} \neq l} r_{j^{'}} \cdot {\tilde{b}}_{i j^{'}}}) \cdot (\prod_{i = 1}^{τ} h {(F_{N} ‖i)}^{r_{w}} \cdot ω^{\sum_{i = 1}^{τ} r_{w} \cdot {\tilde{b}}_{i w}}), Y) \\ e (T, g) = e (\prod_{(j^{'}, r_{j^{'}}) \in S} \prod_{i = 1}^{τ} h {(F_{N} ‖i)}^{r_{j^{'}}} \cdot ω^{\sum_{i = 1}^{τ} ξ_{i}}, Y) . \end{matrix}

(6)

We set

\begin{matrix} \prod_{i = 1}^{τ} h {(F_{N} ‖i)}^{r_{w}} \cdot ω^{\sum_{i = 1}^{τ} r_{w} \cdot {\tilde{b}}_{i w}} = g^{x^{'}}, \\ \prod_{i = 1}^{τ} h {(F_{N} ‖i)}^{r_{l}} \cdot ω^{\sum_{i = 1}^{τ} r_{l} \cdot {\tilde{b}}_{i l}} = g^{y^{'}}, \end{matrix}

(7)

where

x^{'}, y^{'} \in Z_{q}^{*}

x^{'} \neq y^{'}

If $(T^{'}, ξ^{'})$ can pass the verifier's verification, we have $g^{x^{'}} = g^{y^{'}}$ . Since q is a large prime integer, the probability of $g^{x^{'}} = g^{y^{'}}$ is negligible.

Replay. The essence of replay attack is to use the previous proof to replace the current challenge's response. That is, the CSP does not actually access the latest data status. To some extent, the replay attack can be regarded as the multiple replacement attack. Therefore, we can use the same analysis method as replacement attack to analyze replay attack. If the challenged latest block-tag pairs are different with the previous block-tag pairs, the previous response will not pass the verifier verification.

5. Performance Analysis

In terms of our scheme implementation, we have simulated this MR-PDP protocol in C language. We conduct the following experiments on the Win7 system. To start, we perform several experiments on a system with an Intel(R) Core(TM) 3.10 GHZ processor and 4 GB RAM computer. We set the encryption security parameters $λ = 60$ according to [28]. We store copies of a data file of sizes 1 MB, 5 MB, 10 MB, and 20 MB, and their copies are divided into $2^{18}$ blocks. We assume that the elliptic curve group we deal with has a 256-bit group order [30]. The communication cost for each stage (file size 1 MB) in this protocol is presented in Table 1, where s is number of data files, $B_{h}$ is the size of hash function, C denotes the size of ciphertext, $B_{N}$ is the size of the file name, and $Δ C$ is the size of $F H E (Δ b_{j})$ .

Table 1

DPRDP scheme communication cost with file size 1 MB.

Phase	Cost	Data	From	To
Storage	$2^{23} \cdot \log C$ bits	C	Data owner	CSP
Challenge	$\log_{2} n + 2 \log_{2} q$ bits	$(c, k_{1}, k_{2})$	Data owner/verifier	CSP
Verification	$τ \cdot c \cdot (2 \cdot q^{2} \cdot B_{h} \cdot C \cdot \log q + \log C \cdot q)$ bits	$(T, ξ)$	CSP	Data owner/verifier
Update	$\log B_{N} + \log Δ C + 2 \cdot τ \cdot q \cdot B_{h} \cdot C \cdot \log q$ bits	$(F_{N}, R e q u s e t, j, FHE(Δ b_{j}), T_{i j}^{'})$	Data owner	CSP
Retrieval	$c \cdot τ \cdot (2 \cdot B_{h} \cdot C \cdot \log q + \log C)$	$(T_{i j}, {\tilde{b}}_{i j})$	CSP	Data owner/verifier

We give a performance comparison between the MR-PDP scheme proposed in our scheme and the DMR-PDP scheme proposed in [29].

Figure 7 presents the data setup computation time comparison between the DMR-PDP scheme and the DPRDP scheme using three copies for the file sizes 1, 5, 10, and 20 MB, respectively. Similarly with the scheme in [29], the data setup process is completed by the data owner only once. As shown in Figure 7, the DPRDP scheme performs faster than DMR-PDP scheme.

Figure 7

The Ccmparison of data setup time between DMR-PDP and DPRDP scheme.

We use the FHE encryption to generate the data block copy, but Raghul uses the Paillier encryption to generate the data block copies in DMR-PDP scheme. The following three elements are considered as the main influence factors for the efficiency of data setup. (1) The generation of encryption parameters: the time for generating parameter of FHE is much faster than that in Paillier encryption. In addition to generating random numbers, Paillier encryption also needs to compute the least common multiple which is a relatively consuming time process when both p and q are large number. However, this is not the main factor to affect the efficiency of data setup. (2) Encryption: the time complexity of FHE is $O (γ)$ and the time complexity of Paillier encryption is $2 \cdot O ({(l o g N)}^{3})$ . Referring to the security parameter in DMR-PDP scheme and DPRDP, we get $O (γ) < 2 \cdot O ({(l o g N)}^{3})$ . The encryption time is the main factor of influencing the data setup efficiency. As shown in Figure 8, the computation difference between DMR-PDP scheme and DPRDP scheme is more and more obvious with the increase of file size. (3) Signature parameter setting: both DMR-PDP scheme and DPRDP scheme are involved in power computation. But the DMR-PDP scheme also needs to complete N times multiplication and addition operations; it will slow down the computation speed. Therefore, the data setup efficiency of the DPRDP scheme is better than that in the DMR-PDP scheme. In particular, when the file size increases, the advantage of the DPRDP scheme is more obvious.

Figure 8

(a) The comparison of data owner computation time for signatures between DMR-PDP and DPRDP scheme with file size of 1 MB and 5 MB. (b) The comparison of data owner computation time for signatures between DMR-PDP and DPRDP scheme with file size of 10 MB and 20 MB.

In Figure 8, the signing efficiency comparison of data owner in DPRDP scheme and DMR-PDPD scheme is presented. We set the number of replicas as 3, 5, and 10. The solid line denotes the signing time of data owner in DMR-PDP scheme and the dotted line represents the signing time of data owner in DPRDP scheme.

The signing efficiency comparison of data owner in DPRDP scheme and DMR-PDPD scheme for file sizes 1 MB and 5 MB is shown in Figure 8(a). The red line denotes the data owner signing time comparison with file size 1 MB for the number of copies 3, 5, and 10. Although the algorithms for generating tags in DPRDP scheme and DMR-PDP scheme are two similar methods, but multiplication and addition must be needed in DMR-PDP scheme, the signing time will be slower than that in DPRDP scheme. The blue line denotes the data owner signing time comparison with file size 5 MB for the number of copies 3, 5, and 10. When there exist 3 data copies, the signing time in DMR-PDP scheme is 0.037 seconds slower than that in DPRDP scheme. In Figure 8(b), we show the signing efficiency comparison of data owner in DPRDP scheme and DMR-PDPD scheme for file sizes 10 MB and 20 MB. The green line denotes the data owner signing time comparison with file size 10 MB with the number of copies 3, 5, and 10. When the file size is 10 MB, the difference of signing time between these two schemes is small. In Figure 8(b), the purple line denotes the data owner signing time comparison with file size 20 MB with the number of copies 3, 5, and 10. For 20 MB file, the growth rate of signing time in DMR-PDPD and DPRDP scheme increases a little when the number of copies changes from 3 to 5.

The tag aggregation efficiency comparison of CSP in DPRDP scheme and DMR-PDPD scheme is presented in Figure 9. Let the number of replicas be 3, 5, and 10. The solid line denotes the tag aggregation time of CSP in DMR-PDP scheme and the dotted line represents the tag aggregation time of CSP in DPRDP scheme.

Figure 9

(a) The comparison of CSP computation time for aggregation tag between DMR-PDP and DPRDP scheme with file size of 1 MB and 5 MB. (b) The comparison of CSP computation time for aggregation tag between DMR-PDP and DPRDP scheme with file size of 10 MB and 20 MB.

In Figure 9(a), we show the tag aggregation efficiency comparison of CSP in DPRDP scheme and DMR-PDPD scheme for file sizes 1 MB and 5 MB. The red line denotes the CSP aggregation time comparison with file size 1 MB for the number of copies 3, 5, and 10. For the algorithms for aggregation tags in DPRDP scheme and DMR-PDP scheme are two similar methods, the difference of aggregation time between these two schemes is small when the file size is 1 MB. The blue line denotes the CSP aggregation time comparison with file size 5 MB with the number of copies 3, 5, and 10. When there exist 3 data copies, the aggregation time in DMR-PDP scheme is 2.2 seconds slower than that in DPRDP scheme. The aggregation efficiency difference between DMR-PDP scheme and DPRDP scheme narrows as the increase of number of copies. When the number of copies is 10, the aggregation efficiency of the two schemes is very close.

The aggregation efficiency comparison of CSP in DPRDP scheme and DMR-PDPD scheme for file sizes 10 MB and 20 MB is shown in Figure 9(b). The green line denotes the CSP aggregation time comparison with file size 10 MB for the number of copies 3, 5, and 10. When the file size is 10 MB, the difference of aggregation time between these two schemes is small. With the increase of number of copies, growth rate of aggregation time is getting bigger. The purple line denotes the CSP aggregation time comparison with file size 20 MB for the number of copies 3, 5, and 10. Also, the growth rate of aggregation time is getting bigger for both DMR-PDP scheme and DPRDP scheme. For 20 MB file, the growth rate of aggregation time in DMR-PDPD scheme is faster than that in DPRDP scheme when the number of copies changes from 5 to 10. Therefore, when both the number of data copies and the file size are large, the DPRDP scheme is more effective.

6. Conclusion

In this paper, we have presented a dynamic replicated data possession checking scheme for checking the data copies integrity in the CSP. The data owners encrypt their data using FHE algorithm, obtain the data replicas, and then store their encryption data replicas on the CSP. Firstly, the proposed scheme not only supports checking integrity of the data file copies anytime and anywhere but also satisfies public verification without leaking any information to the third-party auditor. Secondly, the authorized users can decrypt data copies received from the CSP using a single decryption secret key. Thirdly, the data owner can implement data block update operation on data replicas. Finally, the retrieval algorithms in this scheme can identify the corrupted data blocks in corrupted copies accurately. The corrupted data block copies can be reconstructed by the data owner, and thus he/she needs not to generate the whole corrupted copy file. Through security analysis, we have shown that our scheme can resist forgery attacks, replacement attack, and replay attack. At the same time, our DPRDP scheme achieves a higher efficiency than DMR-PDP scheme.

We consider that there exists data expansion in the encryption process, so how to reduce the data expansion is our future work. Thus, we can further improve the efficiency of the proposed scheme.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors thank the editor and reviewers for their suggestions to improve the quality of paper. This work was supported by the NSF of China (nos. U1433105 and U1536118) and the Beijing Higher Education Young Elite Teacher Project (YETP0448).

References

Buyya

Yeo

C. S.

Venugopal

Broberg

Brandic

Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility

Future Generation Computer Systems 2009 25 6 599 616

10.1016/j.future.2008.12.001

2-s2.0-63649117166

Jula

Sundararajan

Othman

Cloud computing service composition: a systematic literature review

Expert Systems with Applications 2014 41 8 3809 3824

10.1016/j.eswa.2013.12.017

2-s2.0-84891771378

Wang

Domingo-Ferrer

Qin

Identity-based remote data possession checking in public clouds

IET Information Security 2014 8 2 114 121

10.1049/iet-ifs.2012.0271

2-s2.0-84894562939

Wang

Zeng

Yao

Cloud-DLS: dynamic trusted scheduling for cloud computing original research article

Expert Systems with Applications 2012 39 3 2321 2319

10.1016/j.eswa.2011.08.048

Wang

H. Q.

Identity-based distribute provable data possession in multi-cloud storage

IEEE Transactions on Services Computing 2015 8 2 328 340

10.1109/tsc.2014.1

Lin

Y.-K.

Chang

P.-C.

Maintenance reliability estimation for a cloud computing network with nodes failure

Expert Systems with Applications 2011 38 11 14185 14189

10.1016/j.eswa.2011.04.230

2-s2.0-79959935062

Ateniese

Burns

Curtmola

Provable data possession at untrusted stores

Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS ‘07)

October-November 2007

Alexandria, Va, USA

ACM

598 609

10.1145/1315245.1315318

Ateniese

Burns

Curtmola

Herring

Khan

Kissner

Peterson

Song

Remote data checking using provable data possession

ACM Transactions on Information and System Security 2011 14 1 34

10.1145/1952982.1952994

2-s2.0-79959823910

Shah

M. A.

Swaminathan

Baker

Privacy-preserving audit and extraction of digital contents

Report 2008 2008/186

Cryptology ePrint Archive

10.

Zeng

Publicly verifiable remote data integrity

Information and Communications Security: 10th International Conference, ICICS 2008 Birmingham, UK, October 20–22, 2008 Proceedings 2008 5308

Berlin, Germany

Springer

419 434 Lecture Notes in Computer Science

10.1007/978-3-540-88625-9_28

11.

Shah

M. A.

Baker

Mogul

J. C.

Swaminathan

Auditing to keep online storage services honest

Proceedings of the 11th USENIX Workshop on Hot Topics in Operating Systems (HOTOS ‘07)

2007

Berkeley, Calif, USA

1 6

12.

Sebé

Domingo-Ferrer

Martínez-Ballesté

Deswarte

Quisquater

J.-J.

Efficient remote data possession checking in critical information infrastructures

IEEE Transactions on Knowledge and Data Engineering 2008 20 8 1034 1038

10.1109/TKDE.2007.190647

2-s2.0-46649083439

13.

Juels

Kaliski

B. S.

Jr.

Pors: proofs of retrievability for large files

Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS ‘07)

October 2007

Alexandria, Va, USA

584 597

10.1145/1315245.1315317

14.

Shacham

Waters

Compact proofs of retrievability

Advances in Cryptology—ASIACRYPT 2008 2008 5350

Berlin, Germany

Springer

90 107 Lecture Notes in Computer Science

10.1007/978-3-540-89255-7_7

15.

Bowers

K. D.

Juels

Oprea

HAIL: a high-availability and integrity layer for cloud storage

Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS ‘09)

November 2009

Chicago, Ill, USA

ACM

187 198

10.1145/1653662.1653686

2-s2.0-74049144464

16.

Bowers

K. D.

Juels

Oprea

Proofs of retrievability: theory and implementation

Proceedings of the ACM Cloud Computing Security Workshop (CCSW ‘09)

2009

Chicago, Ill, USA

43 54

17.

Dodis

Vadhan

Wichs

Proofs of retrievability via hardness amplification

Proceedings of the 6th Theory of Cryptography Conference (TCC ‘09)

March 2009

San Francisco, Calif, USA

109 127

18.

Curtmola

Khan

Burns

Robust remote data checking

Proceedings of the 4th ACM International Workshop on Storage Security and Survivability

October 2008

63 68

19.

Juels

Kaliski

B. S.

Jr.

Pors: proofs of retrievability for large files

Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS ‘07)

November 2007

ACM

584 597

10.1145/1315245.1315317

2-s2.0-74049103479

20.

Wang

Ren

Lou

Toward publicly auditable secure cloud data storage services

IEEE Network 2010 24 4 19 24

10.1109/mnet.2010.5510914

2-s2.0-77954843911

21.

Juels

Kaliski

B. S.

Jr.

Pors: proofs of retrievability for large files

Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS ‘07)

November 2007

New York, NY, USA

ACM

584 597

10.1145/1315245.1315317

2-s2.0-74049103479

22.

Deswarte

Quisquater

J.-J.

Sadane

Strous

S. J. L.

Remote integrity checking

Proceedings of the 6th Working Conference on Integrity and Internal Control in Information Systems (IICIS ‘03)

November 2003

Lausanne, Switzerland

1 11

23.

Wang

H. Q.

Q. H.

Qin

Domingo-Ferrer

Identity-based remote data possession checking in public clouds

IET Information Security 2014 8 2 114 121

10.1049/iet-ifs.2012.0271

24.

Filho

D. L. G.

Barreto

P. S. L. M.

Demonstrating data possession and uncheatable data transfer

Report 2006 2006/150

Cryptology ePrint Archive

25.

Erway

Küpçü

Papamanthou

Tamassia

Dynamic provable data possession

Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS ‘09)

November 2009

New York, NY, USA

ACM

213 222

10.1145/1653662.1653688

2-s2.0-74049121230

26.

Hao

Zhong

A privacy-preserving remote data integrity checking protocol with data dynamics and public verifiability

IEEE Transactions on Knowledge and Data Engineering 2011 23 9 1432 1437

10.1109/TKDE.2011.62

2-s2.0-79960900327

27.

Barsoum

A. F.

Hasan

M. A.

On verifying dynamic multiple data copies over cloud servers

Report 2011 2011/447

Cryptology ePrint Archive

http://eprint.iacr.org/

28.

Kaosar

M. G.

Paulet

Bertino

Single-database private information retrieval from fully homomorphic encryption

IEEE Transactions on Knowledge and Data Engineering 2013 25 5 1125 1134

10.1109/tkde.2012.90

2-s2.0-84875715597

29.

Mukundan

Madria

Linderman

Efficient integrity verification of replicated data in cloud using homomorphic encryption

Distributed and Parallel Databases 2014 32 4 507 534

10.1007/s10619-014-7151-0

2-s2.0-84919879849

30.

Barsoum

A. F.

Hasan

M. A.

Provable multicopy dynamic data possession in cloud computing systems

IEEE Transactions on Information Forensics and Security 2015 10 3 485 497

10.1109/tifs.2014.2384391

2-s2.0-84921889973

Efficient Dynamic Replicated Data Possession Checking in Distributed Cloud Storage Systems

Abstract

1. Introduction

2. Preliminary

3. Dynamic Multireplicas Provable Remote Data Possession (DPRDP) Scheme

3.1. DPRDP Scheme

Algorithm 1: Retrieve corrupted block index.

4. Security Analysis

Theorem 1.

5. Performance Analysis

6. Conclusion

Footnotes

Conflict of Interests

Acknowledgments

References