Sage Journals: Discover world-class research

Abstract

Cloud storage service enables users to migrate their data and applications to the cloud, which saves the local data maintenance and brings great convenience to the users. But in cloud storage, the storage servers may not be fully trustworthy. How to verify the integrity of cloud data with lower overhead for users has become an increasingly concerned problem. Many remote data integrity protection methods have been proposed, but these methods authenticated cloud files one by one when verifying multiple files. Therefore, the computation and communication overhead are still high. Aiming at this problem, a hierarchical remote data possession checking (hierarchical-remote data possession checking (H-RDPC)) method is proposed, which can provide efficient and secure remote data integrity protection and can support dynamic data operations. This paper gives the algorithm descriptions, security, and false negative rate analysis of H-RDPC. The security analysis and experimental performance evaluation results show that the proposed H-RDPC is efficient and reliable in verifying massive cloud files, and it has 32–81% improvement in performance compared with RDPC.

Keywords

Cloud storage data integrity provable data possession homomorphic tag hash tree

Introduction

In recent years, cloud computing is developing very rapidly. A large number of cloud providers such as Google, Microsoft, and Amazon have emerged. These cloud providers offer diverse services over the Internet, including online storage and computing resources, and web application hosts (e.g. Amazon’s S3, Microsoft’s Azure service and Google App Engine).¹ Cloud storage is one of the most important cloud applications, with which individuals can store data online and companies can backup local data to the cloud.

However, cloud service promotion is very slow. According to the survey of Twinstrata Company, only 20% of people are willing to use the cloud to store their personal data, and about 50% of people are willing to use the cloud storage service for data backup and disaster recovery. The reason is the cloud storage service also brings a series of new security problems. Obviously, data security is a major obstacle to further promote the cloud storage service.

The security problems in cloud storage mainly include data integrity, confidentiality, reliability, and privacy protection.² The integrity is the focus of cloud storage security research; the integrity of the data is to ensure that the data are not tampered with, or tampering can be detected. Data integrity detection of cloud storage is to verify whether the user data on cloud storage servers are in good condition, and avoid users of cloud storage in which data been tampered with or removed.

At present, cloud storage integrity research mainly concentrated on two aspects: provable data possession (PDP)^3,4 and proof of retrievability (POR).^5–7 Their basic idea is to take advantage of some form of challenge-response protocol,⁸ and using probabilistic inspection method based on pseudo-random sampling to reduce communication and computation overhead. PDP through challenge-response protocol proves that the user’s files are intact. PDP can detect more than a certain percentage of the data corruption, but it cannot guarantee that the file is retrievable. Similar to PDP, POR also proves its user that the files are intact through the challenge-response protocol. Moreover, users can retrieve files from the server with high probability. POR combine with pseudo-random sampling and redundancy encoding (such as error correction code).

Related works

Ateniese et al.³ have proposed PDP model in 2007. It is a light-weight remote data authentication method, and it sets number of challenges and blocks the data to be detected by the user. The disadvantages of PDP are the update and authentication times are limited and it only supports static types of data. It cannot support dynamic operations. After that, Ateniese et al. proposed dynamic PDP⁴ based on public key encryption support files. The validation process of the mechanism based on the symmetric key encryption algorithm and it can provide data update, delete operations of file blocks, but it cannot support insert operation during file blocks.

Erway et al. presented a framework and a construction for dynamic provable data possession (DPDP),^9,10 which extends the original PDP model in Shuang et al.¹ to support insert, update, and delete operations of file blocks using authenticated skip list. Wang et al. implements another PDP scheme¹¹ which supports the full dynamic operation. In this paper, Merkle hash tree (MHT) is used to ensure the correctness of the data block in the position and Boneh–Lynn–Shacham (BLS) signature¹ is used to ensure the integrity of data value. Da et al.¹² proposed a data random sampling verification method, which reduced the computational and communication overheads of possession checking. And the challenge number is no longer restricted, but the data integrity is verified by client, which increased the computation and communication overhead.

Juels et al. proposed POR model in 2007.⁵ In this model, spot-checking and error-correcting codes are used to ensure both possession and retrievability of data files on remote server. Some special blocks called “sentinels” are randomly embedded into the data file for detection purpose. However, the number of queries a client can perform is fixed and public verifiability is not supported in their scheme.

Utilizing the ideas of homomorphism authentication tag of Ateniese’s, Shacham and Waters¹³ construct homomorphism authenticator based on BLS signatures,¹ these short signatures help to aggregate individual signatures and provide a very small authenticated value for public verifiability. The scheme reduces the communication overhead of the verification process and supports unlimited number of challenges.

Wang et al.¹⁴ add a random number in basic BLS signature scheme to guarantee file data privacy during challenge-response process, but the scheme cannot support insert operation. Based on the error correction coding theory, Long and Guoyin¹⁵ proposed a fine-grained integrity check method, an integrity indication code. It provides hash data compression method significantly in low-error rate to verify the data integrity. The disadvantage is they only designed combinatorial codes for one error in a group of data objects; the application scenarios are limited. Chen et al. introduce the concept of remote data possession checking (RDPC),¹⁶ which included PDP and POR schemes. The RDPC protocol is based on homomorphic hash algorithm and introduces MHT to support data update.

Above all, files are verified one by one with challenge-response mode in most proposed schemes, but the computation and communication overhead are high when verifying massive files. To solve the issue, this paper proposes a data integrity verification method, which suits for massive files verification on cloud storage environment. The method can obviously reduce the bandwidth requirements of data integrity detection and the overhead of cloud users and cloud server.

RDPC method

RDPC method is proposed by Chen Lanxiang; RDPC focuses on how to frequently, efficiently, and securely verify a storage server faithfully stores its clients’ original data without retrieving it.¹⁶ RDPC includes PDP and POR schemes.¹⁷ The proposed method is combined with RDPC, in the following the RDPC is stated briefly.

Homomorphic tag

Homomorphism is the concept used multiple times in the remote data integrity authentication in recent years. It refers to the algebraic structure from one to another algebraic structure mapping, such as groups, rings, and domain. It keeps all the related structure same. There is a map $φ : X \to Y$ , which satisfies

In equation (1), “ċ”is the operations of X, o is the operation of Y. Homomorphism tag is generated based on homomorphism. On the basis of the characteristics of homomorphism, it can use less data to detect more data, which meets the requirement of low computation and communication overhead of RDPC. Therefore, homomorphism tag can be used to verify data integrity because it has the characteristic of any two data blocks m_i and m_j. The tag information of their sum is equal to the product of their tag information, that is $T (m i + m j) = T (m i) \times T (m j)$ .

Merkle hash tree

MHT is a widely used structure of data certification and it can be efficient authentication if an element has been tampered with.^18,19 In general, the MHT is built as a binary tree, its leaf nodes are the hash value of the authentication data. The MHT structure is shown in Figure 1. If you want to authenticate data b₂, we need to compute $H (b 2)$ , $h 3 = h (H (b 1) | | H (b 2))$ , $h 1 = h (h 3 | | h 4)$ , and $h 0 = h (h 1 | | h 2)$ , and then check whether the calculated h₀ is the same as the authentic one. The other authentication data are also similar.

Figure 1.

Merkle hash tree (MHT) authentic structure.

The MHT is commonly employed to authenticate the values of data blocks.¹⁶ However, in this paper the MHT is employed to authenticate both the values and the positions of data blocks.²⁰ Using hash tree instead of traditional file index information make data operation on any file block will not affect the others, which can solve the problem of higher cost as its changing index in dynamic file operation.

The basic RDPC protocol

Based on Chen et al.’s study,¹⁶ some functions and parameters should be defined. The basic RDPC needs four cryptographic primitives: $H (\cdot)$ is a homomorphic hash function, $f (\cdot)$ is a pseudo-random function which is used to generate pseudo-random numbers, $σ (\cdot)$ is a pseudo-random permutation which is used to determine the location of randomly selected data blocks for a given challenge, where the value ranges from 1 to n, and $h (\cdot)$ is a cryptographic hash function which is used to compute hash tree.

The main parameters are: $λ p, λ q$ are security parameters, $p, q$ are random primes and satisfy $| p | = λ p, | q | = λ q, q | (p - 1)$ , β is block size, m is the number of sub-blocks per block and satisfies $m = β / (λ q - 1)$ an, g is $1 \times m$ row vector, K is the hash key and is composed of $(p, q, g)$ , s is random seed, F is original file composed of multiple blocks, and R is hash tree root and $sig sk (h (R))$ is the signature of hash root.

RDPC scheme is defined by the following seven polynomial-time algorithms.

$PublicKeyGen (1 k) \to (pk, sk)$ .²⁰ This algorithm is run by the client. It takes the security parameter $1 k$ as input and returns public key $pk$ and private key $sk$ .

$HomomorphicKeyGen (λ p, λ q, m, s) \to (p, q, g)$ . This algorithm is run by the client. It takes the security parameters $λ p, λ q, m$ , and s as input, and returns the homomorphic key $(p, q, g)$ .

$TagGen (sk, K, F) \to (T, sig sk (h (R)))$ .²¹ This algorithm is run by the client. It takes a private key $sk$ , the homomorphic key K and a file F as inputs, and outputs the tag set T and the signature $sig sk (h (R))$ of the root R of a MHT.

$⪻ oofGen (F, T, chal) \to V$ . This algorithm is run by the server. It takes a file F, its tag T, and a challenge $chal$ as inputs and it outputs a data integrity proof V.

$⪻ oofCheck (pk, chal, V) \to {TRUE,$ $FALSE}$ . This algorithm is run by the client when receiving the proof V. It takes the public key $pk$ , the challenge $chal$ , and V proof as inputs, and outputs $TRUE$ if the integrity of the file is verified as correct or FALSE otherwise.

$ExecUpdate (F, T, update) \to$ $(F', T',$ $P update)$ . This algorithm is run by the server. It takes a file F, its tags T, and a data operation request “update” as input. It outputs an updated file $F'$ , updated tags $T'$ , and a proof $P update$ for the operation.

$VerifyUpdate (pk, update, P update) \to$ ${TRUE, FALSE, sig sk (h (R'))}$ . This algorithm is run by the client. It takes public key $pk$ , an operation request “update”, and the proof $P update$ from the server as inputs. If the verification succeeds, it outputs a new root signature $sig sk (h (R'))$ and $TRUE$ or $FALSE$ .

The basic RDPC consists of two stages: setup stage and challenge stage. In setup stage, based on functions and parameters, client generates files authentication information and send to the cloud server. In challenge stage, client issues a challenge to cloud server, the server responds the client with proof to prove holding some file blocks, and then the client checks the proof.

Hierarchical RDPC

As we know, verifying the integrity of cloud data with lower overhead has become an increasingly concerned problem, but files need to be authenticated one by one in most proposed methods when detecting massive continuous cloud files. It makes computation overhead and communication overhead is high. Aiming at this problem, a hierarchical data authentication method called hierarchical RDPC (H-RDPC) is proposed. The method can be combined with PDP or POR mode to finish RDPC efficiently.

Basic idea

The proposed H-RDPC suits verifying continuous cloud files. The theoretical foundation of H-RDPC is a locality principle. H-RDPC assumes that if a block of data is intact, then the near blocks are also intact with high probability. Accordingly, if a data block is tampered with, then the near blocks are also tampered with high probability.

The basic idea of H-RDPC is hierarchical certification with challenge-response mode. The basic process is as follows. First, the first lay certification files should be determined by initial access grain and files number, and then these files are verified with coarse-grained integrity check. As a file check succeeds, its near files are considered intact and no longer need verification. As the file check fails, its near files are considered tampered with, and need to be verified with fine-grained integrity check. And so on, until grain size reaches minimum. Obviously, H-RDPC only verifies a part of selected files. Therefore, the computation overhead and communication overhead of H-RDPC are lower compared with other schemes of verifying all cloud files. Moreover, the more files verification, the more performance improvement.

H-RDPC’s algorithm descriptions

The H-RDPC includes setup stage and challenge stage. The former is generated files authentication information; the latter is detecting files on cloud. The H-RDPC detail algorithms of these two stages are described as followed:

Setup stage

The client runs $PublicKeyGen (1 k) \to$ $(pk, sk)$ , $HomomorphicKeyGen (λ p, λ q, m, s) \to (p, q, g)$ and denotes $K = (p, q, g)$ ;

The client runs operations for all the files to be migrated to the cloud as follows:

run $TagGen (K, F) \to F'$ , $F = (b 1, b 2, \dots, b n)$ , $F' = (b 1, b 2, \dots, b n; T 1, T 2, \dots, T n)$ , $f = 0$ , f is file detect flag bit.

Do the following calculation, for i ranges from 1 to n

T i = H K (b i) = Π_{t = 1}^{m} g_{t}^{b t, i} p

Using T_i as the leaf to construct the hash tree ω, and computing $sig sk h (R)$ . Finally, the client sends ${F', ω, sig sk (h (R)), f}$ to cloud server.

Challenge stage

The total number of files that were detected is N, and the initial detection grain is x (that is file interval number of first lay authentication).

The client generates random key e and the number of blocks c and sends $chal = < e, c, N, x >$ to the cloud server.

The cloud server calculates $y = k \cdot x (1 \leq k \leq ⌊ N / x ⌋)$ .

The cloud server runs $⪻ oofGen (F', T, chal)$ $\to V$ for yth files successively as follows. (i) Set $B = 0$ , $T = 1$ . To test whether the current file f is equal to 1, if yes, jump out of this operation, otherwise.

For $i = 1$ to c do

r i = σ e (i) B = B + b r i q T = T \times T r i p

done

Reading all corresponding MHT nodes to the c blocks ${H K (b j), ω j} r 1 \leq j \leq r c$ , and ${ω j} r 1 \leq j \leq r c$ are the siblings node on the path from leaves to the root of the MHT.

The server sends $V = {B, T, {H K (b j), ω j} r 1 \leq j \leq r c, sig sk (h (R))}$ to the client.

When the client receive the proof of $k \cdot x (1 \leq k \leq ⌊ N / x ⌋)$ run $⪻ oofCheck (pk, chal, V)$ ${TRUE, FALSE}$ as follows. (i) The client generate root $R'$ using ${H K (b j), ω j} r 1 \leq j \leq r c$ and checks whether $h (R') = ? h (R)$ , if yes, it checks $H K (B) = ? T$ , if yes, it return TRUE, otherwise, it return FALSE, and then update f = 1 of corresponding file on cloud. (ii) To test whether $x > 1$ , if false, jump out of the circulation, otherwise, $y = k \cdot x - ⌊ x / 2 ⌋$ . The clients approach a challenge for the yth file, that is send $chal = < e, c, y >$ to the cloud and run step (5). $y = k \cdot x + ⌊ x / 2 ⌋$ , judge y whether greater than N, if yes, jump out of the circulation, otherwise, the client approach a challenge for the yth file, that is send $chal = < e, c, y >$ to the cloud and run step (4). (iii) The cloud server runs $⪻ oofGen (F', T, chal) \to V$ for yth file.

All the files of check FALSE are summarizing to report.

Dynamic data update operations

H-RDPC can fully support dynamic data operations, including data modification, data insert, and data deletion for cloud data storage. We assume that the file F and corresponding tag information have already been generated and properly stored in server.

Data modification

Data modification is one of the most frequent operations in cloud data storage. Suppose the client wants to modify ith block b_i to $b_{i}^{*}$ , the procedure is described as follows:

Client generates corresponding tag $T_{i}^{*}$ based on the new block $b_{i}^{*}$ , and constructs an update request message $(M, i, b_{i}^{*}, T_{i}^{*})$ and sends to cloud server.

Upon receiving the request, the cloud server runs $ExecUpdate (F, T, update)$ : it replaces the block b_i with $b_{i}^{*}$ and outputs $F *$ ; it replaces $T i$ with $T_{i}^{*}$ and outputs $T *$ and gets auxiliary authentication information $ω_{i}^{*}$ . The cloud server replaces $H K (b i)$ with $H K (b_{i}^{*})$ in MHT and generates the new root $R *$ . Then the cloud server sends the client a proof, $V update = {H K (b i), ω i, sig sk (h (R)), R *}$ .

Client generates root $R * *$ using ${H K (b i), ω i}$ and authenticates $R * *$ by checking whether $h (R * *)$ is equal to $h (R)$ . If it is not true, output FALSE, otherwise it checks whether the server has performed the modification as required, and then the client compute new root value with ${H K (b_{i}^{*}), ω_{i}^{*}}$ and comparing it with $R *$ . If it is not true, output FALSE, otherwise output TRUE. And then, the client signs the new root $R *$ and then sends it to the server for update.

Data insertion

Data insertion refers to insert new block after some specified positions in file F. Suppose the client wants to insert $b_{i}^{*}$ after the ith data block b_i. Similar to the data modification case, the insertion procedure is described as followed.

The client generates corresponding tag $T_{i}^{*}$ based on the new block $b_{i}^{*}$ and constructs insert request message $(I, i, b_{i}^{*}, T_{i}^{*})$ and sends to the cloud server.

Upon receiving the request, the server runs $ExecUpdate (F, ω, update)$ : it stores $b_{i}^{*}$ and adds a leaf node $H K (b_{i}^{*})$ after leaf node $H K (b i)$ in the MHT and outputs $F *$ , and adds the $T *$ into the tag set and outputs $ω_{i}^{*}$ . It generates the new root $R'$ based on the updated MHT. Then the cloud server responses the client with a proof $V update = {H K (b i), ω i, sig sk (h (R)), R *}$ .

Client generates root $R * *$ using ${H K (b i), ω i}$ and authenticates $R * *$ by checking whether $h (R * *)$ is equal to $h (R)$ . If it is not true, output FALSE, otherwise the client checks whether the server has performed the insertion as required, and then the client computes new root value with ${H K (b_{i}^{*}), ω_{i}^{*}}$ and compared with $R *$ . If it is not true, output FALSE, otherwise output TRUE. Then the client signs the new root $R *$ by $sig sk (h (R *))$ and sends it to the server.

Client executes the default integrity verification. If the output is TRUE, $sig sk (h (R *))$ , $P update$ , and $b *$ are deleted from its local storage.

The process of data deletion is on the contrary and similar to the process of data insertion. Therefore, we omit the deletion process.

Security analysis and false negative rate analysis

Security analysis

To prove the security of proposed H-RDPC, we construct a data possession game between data owner and adversary. If adversary can win the game means he possesses all cipher blocks and tags accurately. And in the following we provide two theorems and proof.

Theorem 1

Under the assumption of big integer factorization difficulty question, the paper proposed H-RDPC is security for detect files.

Proof

At first, we describe the following game.

Generate keys: the cloud user run $PublicKeyGen ()$ to generate key pair, and send the public key to the adversary.

Query: the adversary choose a data block $m i (1 \leq i \leq n)$ send to the cloud user, and the cloud user run $TagGen ()$ to compute tag T_i for received block, and send T_i to the adversary.

Challenge: The cloud user launches a challenge $chal$ , which requires the adversary compute prove data possession based on challenge information.

Forge: The adversary generate prove data possession V based challenge $chal$ , data m_i and tag T_i.

Verification: The Cloud user runs $\Pr oofCheck ()$ to detect V. If $⪻ oofCheck (pk, chal, V) = TRUE,$ then the adversary wins the game and have possessed all ciphertext blocks and tags.

In the game, cloud user verify the message based on prove data possession V of the adversary. When the adversary possess all data blocks and tags, it must satisfy $H K (B) = T$ and $T i = H K (m i)$ . When the adversary tamper with or delete block m_i or tag T_i, he must fake suitable $m i'$ and $T i'$ make $T i' = H K (m i')$ and $H K (B) = T$ , and the adversary has the ability to construct two random big primes make $p \neq q$ , and $g p = g q N$ . Therefore, $p - q = k φ (N)$ established and $p - q$ can be used to resolve big primes N. But under the assumption of big integer factorization difficulty question, the adversary cannot win the proved data possession game unless possess all cipher data blocks and tags, end of proof.

Theorem 2

The H-RDPC protocol is secure if for any probabilistic polynomial time adversary who can win the data possession game with non-negligible probability.

Proof

The challenger interacts with the adversary in the data possession game, using the given hash function, and creates the tags. There assuming exists a collision-resistant hash function. The challenger has two sub-entities: An extractor and reductor. An extractor extracts the challenged blocks from the adversary’s proof, and a reductor who breaks the collision-resistance of the hash function. If the adversary can win the data possession game, then the extractor can execute $⪻ oofGen ()$ repeatedly until it extracts the selected blocks. If the extractor cannot extract the blocks, then the adversary cannot win the game with more than negligible probability.

We consider challenging c blocks. Let $i i, i 2, \dots, i c$ be the c challenged indices. We denote the blocks and tags of them as $m i 1, m i 2, \dots, m ic$ and $t i 1, t i 2, \dots, t ic$ , respectively. The extractor outputs the blocks contained in the proof sent by the adversary. If this proof verifies, and adversary wins the game, it must be the case that either all of the blocks are intact or the reductor breaks collision-resistance hash function. Now we suppose the extractor fails, which means that there is $m ij \neq b ij$ where $1 \leq i \leq n$ , $1 \leq j \leq c$ , $1 \leq c \leq n$ . The challenger provides the reductor with the blocks in the proof, their block ID and the hash function. Then the reductor retrieves the original blocks $m i 1, m i 2, \dots, m ic$ from cloud storage, and generates $M = \sum_{i = 1}^{j = c} m ij$ and $T = Π_{i = 1}^{j = c} t ij$ . Finally checks if $H K (M) = T$ . If this is the case, the reductor can break the collision-resistance of the hash function.

Therefore, if the adversary has a non-negligible probability of winning the data possession game, the challenger can either extract or break the collision-resistance of the hash function with non-negligible probability. The probability of extracts blocks and break the collision-resistance of the hash function are negligible. Therefore, the adversary cannot win the prove data possession game, end of proof.

False negative rate analysis

H-RDPC exists in a certain false negative rate (FNR). In general, for continuous damage files, the FNR is lower; for disperse damage files, the FNR is higher. To illustrate the character, a FNR example is designed.

We assume there are 60 files store on cloud server need to be detected, and the proportion of random damage files is 10%. H-RDPC is used to check these files and test H-RDPC’s FNR. For the convenience of analysis, the concept of polymerization degree (PD) is introduced. In this paper, PD is the number of contiguous damage files. For example, if PD equals 3 means there are three contiguous damage files. In this example, maximum PD is set as 1, 2, 4, and 6, respectively. Under each PD value, there are three random samples (sample1, sample2, sample3); detail error distributions of the files are shown in Table 1.

Table 1.

Detail error distribution of the files.

PD	Damage files number
PD	sample1	sample2	sample13
1	1,5,18,20,32,39	3,32,38,40,46,48	6,12,19,23,31,45
2	1,2, 15,16,20,21	4,5,27,28,45,46	10,11,34,35,47,48
4	9,10,11,12,25,26	25,26,27,28,37,38	35,36,37,38,41,42
6	10,11,12,13,14,15	36,37,38,39,40,41	42,43,44,45,46,47

PD: provable data.

The initial grain size (IGS) is 5 and 2, respectively. H-RDPC’s FNRs are shown in Figure 2. As the figure shows, in general, the H-RDPC’s FNRs are high when PD is 1. The average FNRs are 77.8% and 44.5% when IGS is 5 and 2, respectively. And then FNRs decrease obviously when PD increases. The average FNRs are 33.3% and 0% when IGS is 5 and 2, respectively. FNRs decrease to 0% when PD increases to 6.

Figure 2.

Hierarchical-remote data possession checking’s (H-RDPC’s) false negative rates (FNRs) with different polymerization degree (PD). (a) Initial grain size (IGS) is 5; (b) initial grain size is 2.

Obviously, the higher PD, the lower FNR. Fortunately, based on the local principle, in general, the probability of files successive damage is higher than disperse files damage. Therefore, the FNR can be low. On the other hand, finer-grain needs to be set when cloud users require lower FNR, otherwise, coarser-grain needs to be set.

Performance evaluation

To evaluate the performance of H-RDPC, we proceed the following experiments. The experiments are run in PC machines, Intel i5 CPU 2.5, with 4GB main memory; RDPC is chosen as the comparison object. H-RDPC and RDPC are realized with PBC library 0.5.11 under Ubuntu12.04. Furthermore, we consider communication and compute overhead.

Experiment 1: H-RDPCs’ performance evaluation under different files

In this experiment, file size is 500 MB and block size is 512 KB. There is 10% of the random damage files, the file numbers needs to be verified are 100, 200, 400 and 1000, respectively. H-RDPC’s minimum PD is 2, and the IGS is 2. The RDPCs are the comparison objects. The experimental results are shown in Figure 3. As shown in the figure, the authentication time increases with file number. When verifying 100 files, RDPC authentication time is 8.3 min, while H-RDPC authentication time is only 4.8 min, H-RDPC decreased 42.2% than RDPC. When verifying 1000 files, RDPC authentication time is 83.3 min, while H-RDPC authentication time is only 48.3 min, H-RDPC decreased 42% than RDPC. On average H-RDPC performance improves 41.9% than RDPC, and the FNR is 0%.

Figure 3.

Comparison of hierarchical-remote data possession checking (H-RDPC) and RDPC performance with different file numbers.

Experiment 2: H-RDPCs’ performance evaluation under different damage files proportion

Similar to experiment 1, file size is 500 MB, block size is 512 KB, H-RDPC’s minimum PD is 2, and IGS is 2. The total number of files detected is 500. The proportion of random damage files are 1%, 5%, 10%, and 20%, respectively. RDPCs are the comparison objects. The experimental results are shown in Figure 4. As shown in the figure, when damage files’ proportion is gradually increased, RDPC authentication time is fixed (41.7 min) and H-RDPC authentication time increased slowly. H-RDPC performance improves 48.9%, 45.6%, 40% and 32.1% when damage files proportion is 1%, 5%, 10% and 20%, respectively. On average H-RDPC average performance improves 41.7% than RDPC.

Figure 4.

Comparison of hierarchical-remote data possession checking (H-RDPC) and RDPC performance with different damage file proportions.

Experiment 3: H-RDPCs’ performance evaluation under different grain size

For the convenience of experiment, assuming there are 10% of the continuous files damage, file size is 500 MB, block size is 512 KB, number of files are 500, and the comparison objects are H-RDPC with 2, 5, 10 IGS and RDPC. The experimental results are shown in Figure 5. As shown in the figure, RDPC authentication time is 41.6 min, while H-RDPC authentication time is 22.4 min, 11.5 min and 7.9 min when initial detect grain sizes are 2, 5 and 10, respectively. The corresponding performance improvements are 46.2%, 72.4%, and 81%. On average H-RDPC performance improves 65.8% compared with RDPC.

Figure 5.

Comparison of hierarchical-remote data possession checking (H-RDPC) and RDPC performance with different initial grain sizes (IGSs).

Experiment 4: H-RDPCs’ performance evaluation under different file size

The number of detected files are 500 and block size is 512 KB. The proportion of random damage files are 5%. H-RDPC’s IGS is 2, and the files’ sizes are 100 MB, 200 MB, 500 MB, and 1000 MB, respectively. RDPCs are comparison objects. The experimental results are shown in Figure 6. The figure shows, the verification time of RDPC and H-RDPC is increased with file size. Moreover, H-RDPC verification time is obviously lower. RDPC is increased from 8.3 min (100 MB) to 83.3 min (1000 MB); while H-RDPC is increased from 4.4 min (100 MB) to 44.3 min (1000 MB). On average, H-RDPC verification time decreased 46.5% than RDPC.

Figure 6.

Comparison of hierarchical-remote data possession checking (H-RDPC) and RDPC performance with different file sizes.

Conclusion

This paper proposes a hierarchical data possession checking method H-RDPC which can verify massive files on the cloud efficiently, and it can support dynamic data operation and infinite time verification. The evaluation results showed that under this experiment, H-RDPC performance improves 32–81% approximately than RDPC, and keep lower miss rate. Although there may be some errors due to some limitations, but through the analyses and experimental evaluations of this paper, we can say, H-RDPC is an effective and feasible remote data integrity protection scheme.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: the National Natural Science Foundation of China under Grant No. 61671169, and Science and Technology Research Project in Education Department of Heilongjiang Province No. 12533052.

References

Shuang

Yan

Weihong

. Research and development of provable data integrity in cloud storage. Chin J Comput 2015; 38: 164–177.

Ardagna

Asal

Damiani

. From security to assurance in the cloud. ACM Comput Surv 2015; 48: 1–50.

Ateniese

Bums

Curtmola

. Provable data possession at untrusted stores. In: 14th ACM CCS, Alexandria, Virginia, 29 October–2 November 2007, pp. 598–609.

Ateniese G, Di Pietro R, Mancini LV, et al. Scalable and efficient provable data possession. In: 4th international conference on security and privacy in communication networks, Istanbul, Turkey, 22–25 September 2008, pp.1–10.

Juels A, Burton S and Kaliski, Jr. PORs: proofs of retrievability for large files. In: Proceedings of the 14th ACM conference on computer and communications security (CCS,07), Alexandria, VI, October 2007, pp.584–597. New York, NY: ACM Press.

Sookhak

Gani

Talebian

. Remote data auditing in cloud computing environments: a survey, taxonomy, and open issues. ACM Comput Surv 2015; 47: 1–34.

Bowers KD, Juels A and Oprea A. Proofs of retrievability: theory and implementation. In: Proceeding(s) of ACM workshop on cloud computing security, Chicago, USA, 2009, pp.43–53.

Jiang

Chen

. Towards secure and reliable cloud storage against data re-outsourcing. Future Generation Comput Syst 2014; 52: 346–358.

Erway

Kupcu

Papamanthou

. Dynamic provable data possession. In: 16th ACM CCS, Chicago, Illinois, USA, 9–13 November 2009, pp. 213–222.

10.

Etemad

Küpçü

. Transparent, distributed, and replicated dynamic provable data possession. In: Applied Cryptography and Network Security, Springer, Berlin, Heidelberg, 15 April 2013, pp. 1–18.

11.

Wang Q, Wang C, Li J, et al. Enabling public verifiability and data dynamics for storage security in cloud computing. In: Proceedings of the 14th European symposium on research in computer security (ESORICS’09), Saint-Malo, France, September 2009, pp.355–370. Berlin: Springer-Verlag.

12.

Jiwu

Kang

. A practical data possession checking scheme for networked archival storage. J Comput Res Dev 2009; 46: 1660–1668.

13.

Shacham H and Waters B. Compact proofs of retrievability. In: Proceedings of the 14th international conference on the theory and application of cryptology and information security (ASIACRYPT’08), Melbourne, Australia, December 2008, pp.90–107. Berlin: Springer-Verlag.

14.

Wang

Ren

. Privacy-preserving public auditing for data storage security in cloud computing. In: 29th IEEE INFOCOM, San Diego, California, USA, 15–19 March 2010, pp. 1–9.

15.

Long

Guoyin

. An integrity check method for fine-grained data. Softw J 2013; 20: 902–909.

16.

Chen

Zhou

Huang

. Data dynamics for remote data possession checking in cloud storage. Comput Electr Eng 2013; 39: 2413–2424.

17.

Chen

. Using algebraic signatures to check data possession in cloud storage. Future Generation Comput Syst 2013; 29: 1709–1715.

18.

Wang

Ren

. Enabling public auditability and data dynamics for storage security in cloud computing. IEEE Trans Parallel Distrib Syst 2011; 22: 847–859.

19.

Wang

Chow

Wang

. Privacy-preserving public auditing for secure cloud storage. IEEE Trans Comput 2013; 62: 362–375.

20.

Man

. Improved security of a dynamic remote data possession checking protocol for cloud storage. Expert Syst Appl 2014; 41: 7789–7796.

21.

Worku

Zhao

. Secure and efficient privacy-preserving public auditing scheme for cloud storage. Comput Electr Eng 2014; 40: 1703–1713.

Hierarchical remote data possession checking method based on massive cloud files

Abstract

Keywords

Introduction

Related works

RDPC method

Homomorphic tag

Merkle hash tree

The basic RDPC protocol

Hierarchical RDPC

Basic idea

H-RDPC’s algorithm descriptions

Setup stage

Challenge stage

Dynamic data update operations

Data modification

Data insertion

Security analysis and false negative rate analysis

Security analysis

Theorem 1

Proof

Theorem 2

Proof

False negative rate analysis

Performance evaluation

Experiment 1: H-RDPCs’ performance evaluation under different files

Experiment 2: H-RDPCs’ performance evaluation under different damage files proportion

Experiment 3: H-RDPCs’ performance evaluation under different grain size

Experiment 4: H-RDPCs’ performance evaluation under different file size

Conclusion

Footnotes

Declaration of conflicting interests

Funding

References