Abstract
With the rapid development of cloud computing, an increasing number of data owners are willing to employ cloud storage service. In cloud storage, the resource-constraint data owners can outsource their large-scale data to the remote cloud server, by which they can greatly reduce local storage overhead and computation cost. Despite plenty of attractive advantages, cloud storage inevitably suffers from some new security challenges due to the separation of outsourced data ownership and its management, such as secure data insertion and deletion. The cloud server may maliciously reserve some data copies and return a wrong deletion result to cheat the data owner. Moreover, it is very difficult for the data owner to securely insert some new data blocks into the outsourced data set. To solve the above two problems, we adopt the primitive of Merkle sum hash tree to design a novel publicly verifiable cloud data deletion scheme, which can also simultaneously achieve provable data storage and dynamic data insertion. Moreover, an interesting property of our proposed scheme is that it can satisfy private and public verifiability without requiring any trusted third party. Furthermore, we formally prove that our proposed scheme not only can achieve the desired security properties, but also can realize the high efficiency and practicality.
Keywords
Introduction
Cloud computing is the fusion, development, and application of the concepts of parallel computing, grid computing, and distributed computing.1,2 It can connect large-scale distributed resources through network and form a pool of computing resources to provide tenants with a series of on-demand services, such as data sharing service,3,4 data migration service, 5 and data storage service (i.e. cloud storage service).6,7 Due to a lot of attractive advantages, these services have been widely applied in the daily life and work, especially for the cloud storage service. By employing cloud storage service, the resource-constraint data owners, including the individuals and enterprises, can outsource their large-scale data to the remote cloud server, which can help them to greatly reduce local software/hardware overhead and human resource investments.
Although employing cloud storage service is economically attractive, it inevitably suffers from some new security problems. The data owners lose the direct control over their outsourced data due to the separation of data ownership and management.8,9 Therefore, outsourced data security is the major concern in cloud storage. Previous work mainly studied the existence of outsourced data, that is, data integrity, which has been already well solved.10–12 However, we focus on a complementary issue, that is, secure data deletion which received much less attentions than the data integrity. In cloud storage, outsourced data deletion operation will be executed by the remote cloud server. However, the selfish cloud server might maliciously reserve some data backups for the following two factors. 13 On the one hand, the cloud server can dig some implicit benefits from the data reservation. On the other hand, outsourced data deletion operation may cost some expensive computational overhead, which is unexpected from the cloud server’s point of view. Hence, the cloud server may maliciously reserve some data and return a wrong data deletion result to cheat the data owner.
Some solutions have been proposed to solve the problem of secure data deletion, however, there are still some inherent limitations in the existing data deletion schemes. On the one hand, most of the existing verifiable outsourced data deletion schemes need to introduce a trusted third party (TTP),14–16 which would become a bottleneck because it may refuse to serve due to the heavy computational overhead. Moreover, the security of the TTP is worrying since it will more easily attract the adversary’s attentions. Meanwhile, the TTP cannot resist the commands of the court and government. On the other hand, the efficiency of the existing data deletion schemes is unsatisfactory, especially for these schemes that achieve deletion by overwriting.17,18 In the overwriting model, the outsourced data must be overwritten by some random data, whose size is the same with the outsourced data. However, the outsourced data might be large-scale, resulting in heavy computational cost and communication overhead. Therefore, how to efficiently achieve publicly verifiable outsourced data deletion without requiring any TTP is a problem that needs to be solved solidly.
Besides secure data deletion, dynamic data insertion is another concern. In cloud storage, data insertion has become a fundamental requirement of the data owner. The data insertion operation will be executed by the remote cloud server because the data owner loses the direct control over the outsourced data. However, the cloud server might not honestly perform the data insertion operation since it may cost some computing resources and storage spaces. 19 Therefore, how to conveniently insert some new data blocks into the outsourced data set and efficiently verify the data insertion result is another severe security challenge. To the best of our knowledge, it seems that there is no research work on publicly verifiable outsourced data deletion scheme that simultaneously supports dynamic data insertion without requiring any TTP. Therefore, it is very meaningful to design a publicly verifiable data deletion scheme with dynamic data insertion for cloud storage.
Our contributions
In this article, we mainly study the problem of designing publicly verifiable outsourced data deletion scheme that simultaneously supports dynamic data insertion in cloud storage. The main contributions of this article are listed into the following two aspects.
We adopt Merkle sum hash tree (MSHT) to propose a new publicly verifiable data deletion scheme for cloud storage. The proposed scheme can also simultaneously achieve provable data storage and dynamic data insertion. Moreover, through using the advantages of MSHT, the proposed scheme can achieve public verifiability without requiring any TTP, which is better than most of the existing solutions.
We formally prove that the proposed scheme can satisfy the desired security requirements through detailed security analysis. Moreover, we empirically evaluate the performance overhead of the proposed scheme through simulation experiments, which can prove that the proposed scheme is efficient for the resource-constraint data owners. Meanwhile, it also can demonstrate the practicality of the proposal in real-world applications.
Related work
The problem of secure data deletion has been studied for several decades, resulting in a great number of solutions.20–25 All of the existing data deletion schemes can be categorized into three types through different deletion approaches. The first approach to reach outsourced data deletion is unlinking. When the data owner wants to delete a file, the storage system removes the file link from the underlying file system and returns a one-bit reply (success/failure) to the data owner. Although unlinking can very efficiently delete file link, the content of the file still remains on the storage medium. The adversaries could adopt tools to scan the disk to recover the deleted file easily. 25
The second approach to realize outsourced data deletion is overwriting.26–30 The main idea is that the storage system utilizes some random data to overwrite the disk that maintains the data. Paul and Saxena 31 designed a new cloud data deletion scheme called “Proof of erasability” (PoE), which aims to provide data owner with the ability to verify the data deletion result. Their scheme uses random patterns to overwrite the disk and returns the same patterns as the data deletion evidence. Luo et al. 32 adopted hourglass function to design a new assured data deletion scheme. In their proposed scheme, they assume that the cloud server only maintains the latest version of the data owner’s file, and all the file backups will be consistent when they are updated. Therefore, they can reach data deletion by updating all the file backups with random data. That is, they disguise the overwriting operation as the data updating operation. Finally, the data owner can verify the returned data deletion result through a challenge-response protocol.
The third approach to achieve outsourced data deletion is destroying data decryption key. The main idea is that the data owner encrypts the file before outsourcing. Therefore, when the data will not be needed anymore, the data owner can destroy the corresponding decryption key to reach data deletion. Boneh and Lipton 33 proposed the first cryptography-based solution to process the secure data deletion problem, with a series of follow-up schemes.20,24,34–36 Recently, Yu et al. 37 adopted attribute-based encryption to design a new assured data deletion protocol, which can also simultaneously achieve fine-grained access control. Hao et al. 16 used trusted platform model (TPM) to design a publicly verifiable deletion scheme for secret data. Yang et al. 24 adopted invertible Bloom filter (IBF) to propose a fine-grained data deletion scheme. However, the above schemes need to introduce a TTP (i.e. attribute authority), which is similar to schemes.35,38 The TTP would become a bottleneck that impedes the development of the verifiable outsourced data deletion system. To solve this problem, Yang et al. 20 presented a new publicly verifiable data deletion scheme for cloud storage, which is characterized by confidential data storage, data integrity verification, and verifiable data deletion. Their scheme makes use of blockchain to achieve publicly verifiable deletion without requiring any TTP.
Moreover, Yu et al. 39 focused on how to simultaneously achieve verifiable data transfer and deletion in cloud computing. Their scheme can satisfy the properties of data integrity, data confidentiality, secure data transfer, and verifiable data deletion. To the best of our knowledge, their scheme is the first solution to the problem of efficient data migration and provable transferred data deletion. Later, Xue et al. 40 proposed a similar scheme, which can improve the efficiency in the data deletion process compared to scheme. 39 However, Liu et al. 41 pointed out that there is a security flaw in scheme 40 and then proposed an improved verifiable data transfer protocol, which can also achieve verifiable transferred data deletion. However, the above data transfer and deletion schemes need a third party auditor (TPA). To remove the TTA, Yang et al. 42 designed a publicly verifiable data transfer and deletion scheme based on vector commitment (VC). Their scheme allows the data owner to migrate outsourced data from the original cloud server to the target cloud server and then delete the transferred data from the original cloud server.
Organization
The rest of this article is organized as follows: In section “Preliminaries,” we describe the preliminary of MSHT, which can be viewed as an extension of the traditional Merkle hash tree (MHT). In section “Models and requirements,” we formalize the system model, present the main security challenges, and identify the design goals. In section “Assured deletion with dynamic insertion,” we propose a concrete publicly verifiable data deletion scheme that also supports dynamic data insertion. The detailed security analysis and the performance evaluation are provided in sections “Security analysis” and “Performance evaluation,” respectively. Finally, we simply conclude this article in section “Conclusion.”
Preliminaries
Merkle sum hash tree (MSHT) was first proposed by Miao et al.,
43
which can be viewed as an extension of the MHT.
44
Different from the MHT, MSHT can support dynamic data insertion and deletion. Therefore, MSHT can be utilized to design dynamic data insertion and verifiable data deletion scheme. Moreover, the inputs of the MSHT are different from the MHT, as shown in Figure 1. For every leaf node, the inputs are the data items and the number of the data items that this leaf node contains, for example,

An example of MSHT.
The MSHT can support dynamic data insertion and deletion, which are very attractive. Once a data insertion operation is executed, the MSHT should be also updated simultaneously. Without loss of generality, we assume that the data owner wants to insert a new data block

An example of data insertion in MSHT.
Similarly, after executing a data deletion operation, the MSHT will be also updated. If the leaf node only maintains the data block that needs to be deleted, the leaf node will be removed and the MSHT will be updated, as shown in Figure 3. Otherwise, the cloud server only removes the data block that needs to be deleted, while the other data blocks still remain in the leaf node, as shown in Figure 4.

An example of data deletion in MSHT.

Another example of data deletion in MSHT.
Remark 1
When the data owner wants to insert a new data block into the
Remark 2
To check that whether the MSHT maintains the data blocks honestly, the verifier also needs to utilize the auxiliary authentication information
Models and requirements
In this section, we first formalize the system model of our novel scheme. Then, we describe the main security threats and design goals.
System model
In the system model of our scheme, there are two entities: a data owner and a cloud server, as shown in Figure 5. The data owner is a resource-constraint entity, who wants to outsource his large-scale personal data to the remote cloud server for greatly saving local storage overhead. Later, the data owner may insert some new data blocks into the outsourced data set. Meanwhile, the data owner wants to permanently delete some data blocks when he will not need them anymore. The cloud server has powerful computing ability and almost unlimited storage resources, and provides data storage service for data owners. Meanwhile, it will execute the data insertion and deletion operations according to the data owner’s commands and returns corresponding evidences. Therefore, the data owner can efficiently check the data insertion and deletion results by verifying the returned evidences.

The system model of our scheme.
Security challenges
In our scheme, we mainly consider the following three security challenges.
Data privacy leakage
The adversaries, such as hackers and curious manager of the cloud, want to learn some sensitive information from the outsourced data, which should be kept secret from the data owner’s point of view. Moreover, the selfish cloud server may share the data with other corporators, which may cause data leakage. Furthermore, the software/hardware failures might also cause data leakage. Hence, data privacy leakage is a severe challenge.
Dishonest data reservation
The selfish cloud server might not honestly delete the outsourced data for the following two factors. Firstly, the outsourced data may contain some valuable information. Therefore, the cloud server may maliciously reserve some data backups for digging some implicit benefits. Secondly, the data deletion operation may cost some expensive computational overhead, which is unexpected from the cloud server’s point of view. As a result, dishonest data reservation is another security challenge.
Malicious slander
Both the data owner and the cloud server may deny their behaviors and slander the other maliciously. On the one hand, the malicious cloud server might arbitrary delete some data but claims that it deleted the data according to the data owner’s command. On the other hand, the dishonest data owner had required the cloud server to delete the data. However, when he needs the data later, he might slander that the cloud server arbitrary deleted the outsourced data without his permission.
Design goals
In the proposed scheme, we should achieve the following three design goals.
Assured deletion with dynamic insertion
Overview
We consider the verifiable data deletion model in cloud storage, which is very similar to schemes.20,24 In our scenario, there is a trust problem between the data owner and the cloud server. That is, the data owner does not believe that the cloud server might honestly execute data insertion and deletion operations. To solve this trust problem, plenty of solutions have been proposed, and most of them require a TTP. However, the TTP would become a bottleneck that impedes the development of verifiable data deletion system. To remove this bottleneck, we design a MSHT-based data deletion scheme, which can also achieve dynamic data insertion without requiring any TTP.
The main processes of the proposed scheme are described in Figure 6. In the first step, the data owner encrypts the outsourced file. After that, the data owner divides the ciphertext into

The main process of our proposed scheme.
The concrete construction
In the following part, we propose a novel scheme, which can simultaneously achieve data insertion and deletion in cloud storage. Note that the cloud service provider must authenticate the identity of the data owner before providing him with data storage service. For simplicity, we assume that the data owner has already passed the verification and become a legal user. Hence, the data owner can directly enjoy the cloud storage service. Then, the concrete construction is described as follows:
The data owner first computes a data encryption key
Secondly, the data owner divides the ciphertext
Finally, the data owner sends the outsourced data set
The cloud server maintains the received data set
Upon receiving the storage evidence
The data owner first retrieves all the data blocks that are maintained by the leaf node
On receipt of the data insertion command
Finally, the data owner re-computes the root node
The data owner first retrieves all the data blocks that are maintained by the leaf node
On receiving the data deletion command
Upon receiving
Security analysis
Lemma 1
Assume that
Proof
Then, we first define some games.
Theorem 1
The proposed scheme can satisfy outsourced data confidentiality.
Proof
Data confidentiality can ensure that only the data owner can decrypt the outsourced ciphertext correctly. In the proposed scheme, the outsourced file is encrypted with an IND-CPA secure symmetric encryption algorithm before uploading. Therefore, based on Lemma 1, we can find that our proposed scheme can guarantee only the data owner can decrypt the ciphertext correctly. 45 That is, the proposed scheme can satisfy outsourced data confidentiality.
Theorem 2
The proposed scheme can satisfy verifiable outsourced data deletion.
Proof
Verifiable data deletion can guarantee that if the cloud server does not honestly delete the data blocks, it cannot generate an effective deletion evidence to persuade the data owner that the data blocks have been deleted. In the proposed scheme, the data owner inserts
Meanwhile, note that in the data deletion result verification process, the verifier uses the specific leaf node, the corresponding auxiliary authentication information, and the public key of the cloud server to verify the data deletion result. The leaf node maintains the ciphertext blocks of the outsourced file. That is, the verification process does not need any private information of the data owner and the cloud server. Furthermore, any verifier who owns the data deletion evidence can verify the data deletion result. Therefore, we can think that the proposed scheme can achieve publicly verifiable data deletion.
Theorem 3
The proposed scheme can satisfy accountable traceability.
Proof
Accountable traceability can guarantee that the malicious participant cannot successfully slander the other. The detailed analyses are presented as follows.
Dishonest data owner
If the data owner is dishonest, he may maliciously deny his behavior and slander the cloud server. Specifically, the dishonest data owner had already required the cloud server to delete some data blocks. However, when the data owner needs the deleted data blocks later, he may deny his deletion command and slander that the cloud server arbitrarily deleted those data blocks. In this case, the cloud server can show the deletion command
Malicious cloud server
Similarly, the malicious cloud server may also slander the data owner dishonestly. Specifically, the malicious cloud server may arbitrarily delete some data blocks that are rarely accessed for saving overhead. And if the dishonest data deletion is discovered, the malicious cloud server may slander that he deleted the data blocks according to the data owner’s command. In this case, the data owner can require the cloud server to demonstrate the deletion command
Performance evaluation
In this section, we first compare the functionality of the proposed scheme and some existing solutions. Then, we provide the numeric analysis. Finally, we provide the efficiency evaluation through simulation experiments.
Functionality comparison
In this part, we compare the functionality of the proposed scheme with three existing solutions16,24,40 in theory, and the comparison results are shown in Table 1.
Functionality comparison.
From Table 1, we can easily obtain the following four findings. Firstly, only scheme 40 does not consider data confidentiality, while the other three scheme all can guarantee outsourced data confidentiality. Secondly, all these four schemes are able to achieve publicly verifiable deletion for outsourced data, but only scheme 16 cannot satisfy the property of accountable traceability, which is very different from the other three solutions. Thirdly, dynamic and efficient data insertion has become one of the most fundamental requirements from the data owner’s point of view. Therefore, the proposed scheme provides the data owner with the ability to dynamically insert some new data blocks into the outsourced data set. However, the other three schemes cannot achieve dynamic data insertion. Last but not least, the TTP could become a bottleneck that impedes the development of verifiable data deletion system. Therefore, our proposed scheme removes the TTP, which is much better than the other three schemes. Overall, we can think that our proposed scheme is much more attractive.
Numeric analysis
In the following, we describe the theoretical computational complexity in each process. For simplicity, we utilize the symbols
Computational complexity comparison.
From Table 2, we can find that our proposed scheme needs a hash calculation and an encryption operation to encrypt the outsourced file. However, scheme
16
requires four hash computations and two encryption operations, and scheme
24
needs
Simulation results
In this part, we simulate the proposed scheme based on the open secure socket layer (OpenSSL) library and the pairing-based cryptography (PBC) library. In our experiments, we choose SHA-1 as the secure hash function. Moreover, all the experiments are executed on a Linux machine with Intel(R) Core(TM) i5-6200U processors running at 8 GB main memory and 2.4 GHz. For simplicity, we ignore some other computations, such as addition calculation, multiplication calculation, and communication cost.
Computation of data encryption
Note that the dominated computational overhead in this phase is encrypting the outsourced file. Hence, we increase the encrypted file from 1 to 10 MB with a step for 1 MB in this experiment. Meanwhile, we fix the number

The time cost of encrypting.
Computation of data storage
In this process, the main computational overhead comes from data storage evidence generation and verification, which increases with the number of outsourced data blocks. Therefore, we increase the number of outsourced data blocks from

The time cost of storage verification.
Computation of data insertion
For simplicity, we assume that the new blocks will not be inserted into the leaf nodes that have the same parent node. Meanwhile, we increase the height of the MSHT from 10 to 50 with a step for 20. Meanwhile, we increase the number of inserted data blocks from 5 to 50 with a step for 5. After that, we can test the approximate time overhead, as shown in Figure 9, where

Total time cost in data insertion.
Moreover, we test the data owner’s time overhead in data insertion process, as shown in Figure 10. We can easily find that the growth rate is very low. Moreover, compared with the total time shown in Figure 9, the time cost shown in Figure 10 is much less. It means that plenty of computations are executed by the cloud server. Meanwhile, note that the data insertion result verification operation is one-time, and it can be completed off-line by the data owner. Furthermore, we can think that our proposed scheme is efficient for the data owner in data insertion phase.

Data owner’s time cost in data insertion.
Computation of data deletion
In this process, the dominated computational overhead comes from data deletion command, data deletion evidence generations and data deletion result verification. In this experiment, we fix the number of outsourced data blocks in

Time cost of data deletion.
Conclusion
In cloud storage, there is a trust problem between the data owner and the cloud server. That is, the data owner does not believe that the cloud server may honestly execute outsourced data insertion and deletion operations. In order to solve this trust problem, we adopt the primitive of MSHT as a building block to design a publicly verifiable outsourced data deletion scheme, which can also achieve provable data storage and dynamic data insertion. If the cloud server does not behaves honestly, the data owner can detect the dishonest acts with an overwhelming probability. By utilizing the attractive advantages of MSHT, the proposed scheme can both achieve private and public verifiability without requiring any TTP. Moreover, we prove that the proposed scheme can satisfy the desired design goals through formal security analysis. Finally, we simulate the proposed scheme and provide the efficiency evaluation, which is able to intuitively demonstrate the efficiency and practicability of the proposed scheme.
Footnotes
Handling Editor: Jerzy Balicki
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Opening Project of Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, the National Natural Science Foundation of China (No. 61962015), the Natural Science Foundation of Guangxi (No. 2016GXNSFAA380098), and the Science and Technology Program of Guangxi (No. AB17195045).
