Ghosh, Kamara, and Tamassia (GKT) (ASIA CCS 2021) proposed a graph encryption scheme supporting shortest path queries. This work presents a query recovery attack against the scheme when the adversary is given the original graph and the leakage of certain subsets of queries. The attack falls within the security model used by GKT, and is the first targeting schemes supporting shortest path queries. The attack uses classical graph algorithms to compute the canonical names of the single-destination shortest path spanning trees of the underlying graph and uses these canonical names to precompute the set of candidate queries that match each response. When all shortest path queries to a single node have been observed, the canonical names for the corresponding query tree are computed, and the responses are matched to the candidate queries from the offline phase. The output is guaranteed to contain the correct query. For a graph on vertices, the attack runs in time and matches the time complexity of the GKT scheme’s setup. The attack’s practicality is demonstrated through an implementation and evaluation on the real-world datasets used in the original paper and on random graphs.
Graphs are a powerful tool that can be used to model many problems related to social networks, biological networks, geographic relationships, etc. Plaintext graph database systems have already received much attention in both industry (e.g. Amazon Neptune,1 Facebook TAO,2 Neo4j,3 and GraphDB4) and academia (e.g. Pregel,5 GraphLab,6 and Trinity7).
With the rise of data storage outsourcing, there is an increased interest in graph encryption schemes (GESs). A GES enables a client to encrypt a graph, outsource the storage of the encrypted graph to an untrusted server, and later make certain types of graph queries to the server. Current GES typically only support one type of query, for example, adjacency queries,8 neighbor queries,8 approximate shortest distance queries,9 and exact shortest path queries.10,11
This article extends the work presented by Falzon and Paterson12 and takes a closer look at the security of the GES of Ghosh, Kamara, and Tamassia (GKT) from ASIA CCS 2021.10 This scheme will henceforth be referred to as the GKT scheme. The GKT scheme encrypts a graph such that when a shortest path query is issued for some vertices and of , the server returns information allowing the client to quickly recover the shortest path between and in . The scheme precomputes a matrix called the SP-matrix from which shortest paths can be efficiently computed, and then creates an encrypted version of this matrix, which we refer to as the encrypted database (). is sent to the server. At query time, the client computes a search token for the query ; this token is sent to the server and is used to start a sequence of look-ups to . Each look-up results in a new token and a ciphertext encrypting the next vertex on the shortest path from to . The concatenation of these ciphertexts is returned to the client, and decrypting this sequence reveals the vertices in the shortest path.
The GKT scheme of Ghosh et al.10 is very elegant and efficient. For a graph on vertices, computing the SP-matrix takes time and dominates the setup time. Building a search token involves computing a pseudo-random function. Processing a query at the server requires look-ups in , where is the length of the shortest path from to . Importantly, thanks to the design of the scheme, query processing can be done without interaction with the client, except to receive the initial search token and to return the result. This results in revealing—at query time—the sequence of labels (tokens) needed for the recursive look-up and the sequence of (encrypted) vertices that is eventually returned to the client.
Ghosh et al.10 provide a security proof of the GKT scheme in a simulation-based security model that assumes an honest-but-curious (semi-honest) server. The approach identifies a leakage profile for the GKT scheme and formally proves that the scheme leaks nothing more than this. The leakage profile comes in two parts: setup leakage (available to the server upon receipt of the encrypted data structure) and query leakage (that becomes available to the server as it processes each query). Specifically, the query leakage leaks when two queries are equal, that is, the query pattern, the length of the queried path, and how two paths with the same destination intersect.
This work exploits the query leakage of the GKT scheme to mount a query recovery (QR) attack against the scheme. This attack can be mounted by the honest-but-curious server and requires knowledge of the graph . This may appear to be a strong requirement, but it is, in fact, weaker than is permitted in the security model of Ghosh et al.,10 where the adversary can even choose . Assuming that the graph is public is a standard assumption for many schemes that support private graph queries.10,13,14 There are many settings in which the graph may be known but the edge weights are private, or in which one only wishes to protect the privacy of the client’s queries. This model is ideal for routing and navigation systems in which the road network may easily be obtained online via Google Maps or Waze, but the client may wish to keep its queries private. In such a scenario, the map and traffic information are widely available, but the routing information of individual users is sensitive.
The attack has two phases. First, it has an offline, preprocessing phase that is carried out on the graph . In this phase, a plaintext description of all the shortest path trees in is extracted. These trees are then processed, and the candidate queries for each issued query are computed using each tree’s canonical name. A canonical name is an encoding of a graph that can be used to decide if two graphs are isomorphic; a canonical name of a rooted tree can be computed efficiently using the Aho, Hopcraft, and Ullman (AHU) algorithm.15 This concludes the offline phase of the attack. Its time complexity is , where is the number of vertices in , and matches the run time of our overall attack and the run time of the GKT scheme’s setup. Both the attack and the setup run-time are lower bounded by the time to compute the all-pairs shortest paths (APSP), which takes time for general graphs.16
The second phase of the attack is online: as queries are issued, the adversary constructs a second set of trees that correspond to the sequence of labels computed by the server when processing each query, that is, the per-query leakage of the scheme. That leakage is uniquely determined by the search token that initiates the look-up. This description uses the labels of (which are search tokens) as vertices; two labels are connected if the first points to the second in . When an entire tree has been constructed, the adversary can then run the AHU algorithm again to compute the canonical names associated with this query tree. An entire query tree can be built when all queries to a particular destination have been issued. In practice, this is a realistic routing scenario where many trips may share a common popular destination (e.g. an airport, school, or distribution center).
By correctness of the scheme, there exists a collection of isomorphisms mapping to at least one tree computed in the offline phase. Such isomorphisms also map shortest paths to shortest paths. A matching between paths in the trees from the online phase to the trees in the offline phase is thus performed. This can be done efficiently using a novel extension of the AHU algorithm,15 which decides when one path can be mapped to another by an isomorphism of trees. This yields two look-up tables which, when composed, map each path in the first set of trees to a set of candidate paths in the second set. The search token of the queries associated with is then used to look-up the possible candidate queries in the tables computed in the online phase. These candidate queries are then returned. The run time of this phase is where is the number of complete query trees computed in the online phase. The output is guaranteed to contain the correct query.
In general, the leakage from a query can be consistent with many candidates, and the correct candidate cannot be uniquely determined. Graph theoretically, this is because there can be many isomorphisms between pairs of trees in the two sets. In the chosen graph setting, it is easy to construct a graph where, given any query tree of , its isomorphism is uniquely determined and there is a unique candidate for each query of , that is, one can achieve what is called full QR (FQR). For such graphs, the GKT scheme offers almost no protection to queries. In other cases, the query leakage may result in one or only a few possible query candidates, which may be damaging in practice. In order to explore the effectiveness of the attack, the theoretical results are supported with experiments on eight real-world graphs (six of which were used in Ghosh et al.10) and on random graphs with varying graph sizes and edge probabilities. The results show that for the real-world graphs, as many as 21.9% of all queries can be uniquely recovered, and as many as half of all queries can be mapped to at most three candidate queries. The experimental results show that QR tends to result in smaller sets of candidate queries when the graphs are less dense, and that dense graphs tend to have more symmetries and hence result in larger sets of candidate queries. Note also that the attack is the best possible: it always outputs a minimal set of candidates consistent with the query leakage, and the correct query is always included in the set.
This work extends that of Falzon and Paterson12 along numerous dimensions, including formal proofs of their claims (Section 4), supporting examples (Section 4), additional experiments on random graphs that better demonstrate the relationship between attack success and graph density (Section 5), and a new variation of the attack from the perspective of a network adversary (Appendix A).
The contributions can be summarized as follows:
This work better formalizes the attack in Falzon and Paterson12 against a GES that supports shortest path queries. This attack works in a passive-persistent server-side adversarial setting. A novel extension of the attack in a network-adversarial setting is also presented.
This work leverages the GKT scheme’s leakage to mount an efficient QR attack against the scheme. In particular, for the real-world datasets used, the set of all query trees can be recovered with as few as 68.1% of the queries.
This work makes use of the classical AHU algorithm for the graph isomorphism problem for rooted trees. A new algorithm for deciding when a path in one tree can be mapped onto a path in another tree under an isomorphism is presented.
This work also reports on an implementation of the attack in Python and a thorough evaluation against real-world datasets and random graphs.
Looking ahead toward building new schemes, it is important to better understand how leakage can be exploited. This attack demonstrates that leaking the topology of subtrees of the encrypted graph is detrimental, and that a noninteractive scheme that relies on chaining search tokens may leak too much information to the server. This is true in part because many problems that are not known to be polynomial-time solvable on general graphs can be solved in polynomial time on trees (i.e. the graph isomorphism problem vs. the tree isomorphism problem). The characterization of the GKT scheme’s leakage may thus help inform the construction of more secure GESs.
Prior and related work
This section describes prior and related work concerning GESs and leakage-abuse attacks on structured encryption.
Graph encryption
Chase and Kamara8 present the first GES that supports both adjacency queries and focused subgraph queries. Poh et al.17 give a scheme for encrypting conceptual graphs. Meng et al.9 present three schemes that support approximate shortest path queries on encrypted graphs, each with a slightly different leakage profile. To reduce storage overhead, their solution leverages sketch-based oracles that select seed vertices and store the exact shortest distance from all vertices to the seeds; these distances are then used to estimate shortest paths between any two vertices in the graph. Ghosh et al.10 and Wang et al.11 present schemes that support exact shortest path queries on encrypted graphs.
Other solutions for privacy-preserving graph structures use other techniques such as secure multiparty computation and private information retrieval (e.g. Wu et al.18 and Lai et al.19) and differential privacy (e.g. Sala et al.20). These approaches have different security goals from EDB schemes that are built on symmetric encryption.
Attacks
While many schemes fitting the EDB paradigm have been developed, the security that these schemes offer is not yet fully understood. Security analysis is often done by developing attacks that reconstruct either queries or the database from the leakage functions. Leakage analysis of searchable symmetric encryption (SSE) schemes has been studied in a number of settings including both active21–23 and passive21,22,24–26 adversarial settings. Recently, a number of works analyze the leakage stemming from schemes that support more complex query types such as range queries on one attribute27–34 and multiattribute35,36 data, and -nearest neighbor queries.37 The leakage of GESs was first analyzed by Goetschmann.38 The author considers schemes that support approximate shortest path queries that use sketch-based distance oracles (e.g. Meng et al.9), present two methods for estimating distances between nodes, and give a QR attack that aims to recover the vertices in an encrypted query; the experimental evaluation demonstrates that with auxiliary knowledge on some queries, the adversary can distinguish among candidate vertices which vertex was queried. This work also presents a QR attack, but uses knowledge of the graph rather than partial knowledge of some queries.
Preliminaries
Notation
For an integer , let . The concatenation of two strings and is denoted as .
A dictionary is a map from some label space to a value space ; indicates that . A multimap is a generalization of a dictionary that maps labels to sets of values.
Graphs
A graph is a pair consisting of a vertex set of size and an edge set of size . A graph is directed if the edges specify a direction from one vertex to another. Two vertices are connected if there exists a path from to in . In this paper, it is assumed that all graphs are connected for simplicity. However, the attack and its constituent algorithms directly apply to multicomponent graphs.
A tree is a connected, acyclic graph. A rooted tree is a tree in which one vertex has been designated the root. For some rooted tree and vertex , denotes the subtree of induced by and all its descendants.
Given a graph and some vertex , a single-destination shortest path (SDSP) tree for is a directed spanning tree such that is a subgraph of , is the only sink in , and each path from to in is a shortest path from to in . An example of an SDSP tree can be found in Figure 1(c).
(a) Original graph , (b) its corresponding single-destination shortest path (SDSP) tree for vertex in with the canonical names labeling all the vertices of the tree, and (c) the matching query tree that is leaked during setup (without any vertex labels). (a) Original graph ; (b) SDSP tree for vertex ; and (c) the inferred leakage.
This work makes use of two binary options on graphs. Given two graphs and , the union of and is defined as . Given a graph and a subgraph such that , the graph subtraction of from is defined as .
Hash functions
A set of functions is a universal hash function family if, for every distinct , the hash function family satisfies the following constraint:
In Section 5, the universal hash function is instantiated using a cryptographic hash function.
Graph isomorphisms
The attack will make heavy use of graph isomorphisms and automorphisms. In particular, because the leakage profile of the GKT leaks the network topology of spanning subtrees of the original graph , recovery is information theoretically possible up to graphs with the same topology.
An isomorphism of graphs and is a bijection between vertex sets such that for all if and only if This can be succinctly denoted as .
An isomorphism of rooted trees and is an isomorphism from to (as graphs) such that .
Canonical names
A canonical name is an encoding mapping graphs to bit-strings such that, for any two graphs and , if and only if . For rooted trees, AHU15 describe an algorithm for computing a specific canonical name in time. We refer to this as the canonical name and describe it next.
This work makes use of a modified AHU algorithm, denoted as , to compute the canonical names of rooted trees (and their subtrees) and determine if they are isomorphic. takes as input a rooted tree , a vertex , and an empty dictionary . It outputs the canonical name of the subtree (which is also referred to as the canonical name of ) and a dictionary that maps each descendant of to the canonical name of . The algorithm proceeds from the leaves to the root. It assigns the name “” to all leaves of the tree. It then recursively visits each descendent of and assigns a name by sorting the names of its children in increasing lexicographic order, concatenating them into an intermediate name and assigning the name “” to (see Figure 1(b) for an example). The canonical name of , , is the name assigned to the root by this algorithm.
takes time and space where . Note that the original AHU algorithm can be modified to run in time by only considering one level at a time and reassigning integers to the vertices at that level.15 In contrast, the attack must assign names to each vertex in the tree in order to later compute the path names, so we are forced to make use of the version.
The pseudocode of can be found in Algorithm 1.
The GKT GES
This section gives an overview of the GES of Ghosh et al.10 and its leakage.
GKT scheme overview
The GKT scheme supports single pair shortest path (SPSP) queries. The graphs may be directed or undirected, and the edges may be weighted or unweighted. An SPSP query on a graph takes as input a pair of vertices , and outputs a path such that . This path must be of minimal length in , that is, there does not exist a sequence of edges such that .
SPSP queries may be answered using a number of different data structures. The GKT scheme makes use of the SP-matrix.39 For a graph , the SP-matrix is a matrix defined as follows. Entry stores the second vertex along the shortest path from vertex to ; if no such path exists, then it stores . An SPSP query is answered by computing to obtain the next vertex along the path and then recursing on until is returned.
At a high level, the GKT scheme proceeds by computing an SP-matrix for the query graph and then using this matrix to compute a dictionary . This dictionary is then encrypted using a dictionary encryption scheme (DES) such as Cash et al.40 and Chase and Kamara.8 To ensure that the GKT scheme is noninteractive, the underlying DES must be response-revealing. Since it is germane to this work, the syntax of a DES is described next.
A DES is a tuple of four algorithms , with the following syntax:
is probabilistic and takes as input a security parameter , and outputs a secret key .
takes as input a secret key and dictionary , and outputs an encrypted dictionary ().
takes as input a secret key and a label , and outputs a search token .
takes as input a search token and an , and returns a plaintext value .
Correctness for a states that for all dictionaries , for all keys output by and for pairs in , executing on input and dictionary results in output .
Note that while the GKT scheme itself is response-hiding (i.e. the shortest path is not returned in plaintext to the client), the underlying DES used in the scheme is response-revealing, that is, the values in its are revealed at query time. The response-revealing property of the DES is necessary to enable the GKT scheme to operate in a noninteractive manner.
A more detailed description of the GKT scheme will now be provided. At setup, the client generates two secret keys: one for a symmetric encryption scheme , and one for a . It takes the input graph and computes the SP-matrix . It then computes a dictionary such that for each pair of vertices , it sets if and if in the SP-matrix for some vertex .
The client then computes a second dictionary as follows. For each label-value pair in , the following steps are carried out. A search token is computed from using algorithm and a ciphertext is computed by encrypting using . Then is set to .
The resulting dictionary is then encrypted using to produce an output , which is given to the server.
Now the client can issue an SPSP query for a vertex pair by generating a search token for and sending it to the server. The server initializes an empty string and uses to search and obtain a response . If , then it returns . Otherwise, it parses as , updates and recurses on until is reached on look-up. The server returns , a concatenation of ciphertexts (or ) to the client. The client then uses its secret key to decrypt , obtaining a sequence of pairs from which the shortest path from to can be constructed.
Complexity. The GKT scheme’s setup takes time and is dominated by the cost of computing the SP-matrix. Token generation takes time (assuming use of an efficient DES) and querying takes time where is the maximum length of a shortest path in . The server storage is .
Leakage of the GKT scheme
Ghosh et al.10 provide a formal specification of their scheme’s leakage. Informally, the setup leakage of their scheme is the number of vertex pairs in that are connected by a path, while the query leakage consists of the query pattern (which pairs of queries are equal), the path intersection pattern (the overlap between pairs of shortest paths seen in queries), and the lengths of the shortest paths arising in queries. See Section 4.1 in Ghosh et al.10 for more details.
In the GKT scheme, the server obtains by encrypting the underlying dictionary , in which labels are of the form and values are of the form , using a DES. Here is a search token obtained by running on a pair and is obtained by running also on . Since is obtained by running on , this means that the labels in are derived from tokens obtained by running on inputs . Moreover, these tokens also appear in the values in that are revealed to the server at query time, that is, in the entries .
In turn, the query leakage reveals to the server the token used to initiate a search, as well as all the subsequent pairs that are obtained by recursively processing such a query. Let us denote the sequence of search tokens associated with the processing of some (unknown) query for a shortest path of length as . This string is referred to as the token sequence of . Since the search tokens correspond to the sequence of vertices in the queried path, there are as many tokens in the sequence as there are vertices in the shortest path. By correctness of used in the construction of , no two distinct queries can result in the same token sequence (in fact no two distinct queries can produce the same first token , since each such first token must be used to derive a unique label in identifying the beginning of a specific shortest path).
Notice also that token sequences for different queries can be overlapping; indeed since the tokens are computed by running on inputs where is the final vertex of a shortest path, two token sequences are overlapping if and only if they correspond to queries (and shortest paths) having the same end vertex. Hence, given the query leakage of a set of queries, the adversary can compute all the token sequences and construct from them directed trees, , each tree having at most vertices and a single root vertex. The vertices across all trees are labeled with the search tokens in and there is a directed edge from to if and only if and are adjacent in some token sequence. (Each tree has at most vertices because of our assumption about being connected.)
This set of trees is called the query trees. Each query tree corresponds to the set of queries having the same end vertex. Each tree has a single sink (root) that corresponds to a unique vertex . The tree paths correspond to the shortest paths from vertices to , such that and are connected in . Note that Ghosh et al.10 also discuss these trees, but they do not analyze the theoretical limits of what can be inferred from them.
The leakage of the GKT scheme on a graph after issuing a set of SPSP queries is denoted as . For a formal proof of security that establishes the leakage profile of the GKT scheme, please refer to Ghosh et al.10 The attacks in this work are based only on the leakage of the scheme, as established above, and not on breaking the underlying cryptographic primitives of the scheme.
Implications of leakage
Suppose that all queries have been issued and that we have constructed all query trees , each tree having vertices. Observe that there exists a one-to-one matching between the query trees and the SDSP trees of such that each matched pair of trees is isomorphic. The reason is that the query trees are just differently labeled versions of the SDSP trees; in turn, this stems from the fact that paths in the query trees are in 1–1 correspondence with the shortest paths in .
This now reveals the core of the QR attack, developed in detail in Section 4 below. The server with access to first computes all the SDSP trees offline. As queries are issued, it then constructs the query trees one path at a time. Once a complete query tree is computed (recall that each query tree must have vertices since is connected), the server finds all possible isomorphisms between and the SDSP trees. Then, for each token sequence in , it computes the set of paths in the SDSP trees to which that token sequence can be mapped under the possible isomorphisms. This set of paths yields the set of possible queries to which the token sequence can correspond. This information is stored in a pair of dictionaries, which can be used to look-up the candidate queries.
To illustrate the core attack idea, Figure 1 depicts (Figure 1a) a graph , (Figure 1b) its SDSP tree for vertex (with vertex labels and canonical names), and (Figure 1c) the matching query tree (without vertex labels). It is then clear that the leakage from the unique shortest path of length 2 in Figure 1(c) can only be mapped to the corresponding path with edges , in Figure 1(b) under isomorphisms, and similarly the shortest path of length 1 that is a subpath of that path of length 2 can only be mapped to path . On the other hand, the three remaining paths of length 1 can be mapped under isomorphisms to any of the length 1 paths , , or and so cannot be uniquely recovered.
Since the adversary only learns the query trees and token sequences from the leakage, the degree of QR that can be achieved based on that leakage is limited. In particular, without auxiliary information, the adversary can only recover the candidate queries up to symmetries arising from the isomorphisms between the query trees and the SDSP trees. In practice, this is often not an issue since many queries result in only a very small number of candidate queries (see Section 5 for more details).
Query recovery (QR)
Threat model and assumptions
This work considers a passive, persistent, honest-but-curious adversary that has compromised the server and can observe the initial search token issued, all subsequent search tokens revealed during the query processing, and the response. In particular, the adversary could be the server itself. Appendix A outlines a modified version of the attack in which the adversary is assumed to have only compromised the communication channels between the client and server; this adversary can thus only see the search tokens used to initiate the recursive look-up and the server responses.
The adversary is assumed to know the graph that has been encrypted to create . As noted previously, this is a strong assumption, but it fits within the security model used in Ghosh et al.10 (where can even be chosen) and is realistic in many routing/navigation scenarios. It is further assumed that the adversary sees enough queries to construct a subset of the query trees. Computing all trees does not require observing all possible queries; in the real-world datasets tested, it was possible to construct all query trees with as few as of the possible queries. This is because constructing a query tree that corresponds to only requires observing the queries that start at the leaf nodes of and end at . In SDSP trees with few leaves, only a small fraction of queries is needed.
It is assumed that the APSP algorithm used in constructing the SP-matrix from during setup is deterministic. Moreover, it is assumed that this algorithm is known to the adversary. Such an assumption is reasonable as the adversary knows and many shortest path algorithms are deterministic, including Floyd-Warshall16 and many of its adaptations.
Formalizing QR attacks
QR in general is the goal of determining the plaintext value of queries that have been issued by the client. The notion of QR was introduced by Islam et al.24 in the context of leakage-abuse attacks on SSE schemes and has been extensively studied in the context of SSE and related schemes since.
This work studies the problem of QR in the context of GESs, specifically, the GKT scheme: given , the setup leakage of the GKT scheme and the query leakage from a set of SPSP queries, the adversary’s goal is to match the leakage for each SPSP query with the corresponding start and end vertices of a path in . As noted above, there may be a number of candidate queries that can be assigned to the leakage from each query. The adversary’s goals are formally described below.
(Consistency) Let be a graph, be the set of SPSP queries that are issued, and be the set of token sequences of the queries issued. An assignment is a mapping from token sequences to SPSP queries. An assignment is said to be consistent with the leakage if it satisfies .
Informally, consistency requires that, for each , the query specified by assignment could feasibly result in the observed leakage .
(QR) Let be a graph, be a set of SPSP queries, and the corresponding set of token sequences. Let be the set of all assignments consistent with . The adversary achieves QR when it computes and outputs a mapping: for all .
The adversary achieves QR if, for each (a set of token sequences resulting from queries in ), it outputs a set of query candidates containing every query that is consistent with the leakage. Note that this implies that the output always contains the correct query (and possibly more). This is the best the adversary can do, given the available leakage.
There is some information not conveyed in this mapping. In particular, by fixing an assignment for a given token sequence, it may be possible to fix or reduce the possible assignments for other query responses. Such an example is given below.
Suppose one observes the set of token sequences such that correspond to paths of length 1 and corresponds to a path of length 2, with a subsequence of , and which allows one to construct the query tree in Figure 1(c). Further, suppose that the resulting query tree is not isomorphic to any other query tree. Thus, it is possible to infer that all queries in are rooted at 1. An adversary achieving QR must output the following mappings:
However, if the adversary could fix the assignment to (e.g. by using auxiliary information), then could only be mapped to either or .
A special type of QR when there exists only one assignment consistent with the query leakage is now defined, that is, the case when all queries can be uniquely recovered.
(FQR) Let be a graph, be a set of SPSP queries, and the corresponding set of token sequences. Let be the set of assignments consistent with . An adversary is said to achieve FQR when it (a) achieves QR and (b) .
That is, there is a unique assignment of token sequences to queries consistent with the leakage. Whether FQR is always possible (i.e. for every possible set of queries ) depends on the graph . Specifically, FQR is always possible if and only if each SDSP tree arising in is nonisomorphic and every path in each SDSP tree is fixed by all automorphisms of the tree. It is easy to construct graphs for which these conditions hold (see Section 4.10). For such graphs, the QR attack always achieves FQR.
Technical results
This section develops some technical results concerning isomorphisms of trees and the behavior of paths under those isomorphisms that will be needed in the remainder of the paper.
For any rooted tree and any , let denote the subtree induced by and all its descendants in .
Let and be rooted trees. Let and be paths in and , respectively. If there exists an isomorphism such that , then and for all .
By assumption and by definition of isomorphism of rooted trees . Since is a tree, there exists a unique path between and , and between and . Isomorphisms of graphs must be edge preserving, and so must map the subgraph to . These two paths can only be isomorphic if they are the same length and thus . Putting together these two facts implies that
which concludes the proof.
Given a rooted tree and any , let denote the concatenation of the canonical names of vertices along the path from to in , separated by semicolons:
Computing path names will form the core of the QR attack. A sequence of results about the relationship between path names and isomorphisms will now be proven. Section 4.5 explains how to apply a universal hash function to the path names to compress their length from to bits, thereby reducing storage and run time complexity.
Let and be isomorphic rooted trees and let and denote the set of children of and , respectively. There is an isomorphism from to if and only if there is a perfect matching from to such that for each matched pair , there exists an isomorphism .
To see the forwards direction, let denote an isomorphism from to and note that if for , then by the edge-preservation property of isomorphisms, must map the vertices of to the vertices of , and thus . For the backwards direction, construct an isomorphism from to as follows. Let be the trivial isomorphism that takes to and let
Let . If is an edge in for some then it is easy to see that by restricting to the vertices in , we have that is an edge in . If for some , then . Since is a child of then . A similar argument holds for showing that if , then .
Let and be isomorphic rooted trees. Let and be children of and , respectively. Suppose that is an isomorphism from to . Then there exists an isomorphism from to such that and .
Let and denote the set of children of and , respectively. Since is an isomorphism and is edge preserving, then it must map to , and we necessarily have that .
Proposition 4.6 can now be used to prove the lemma. Let be any isomorphism from to . If maps to then the lemma holds. Otherwise, and for some and . By Proposition 4.6, and , and by assumption . Thus, by transitivity, it follows that . Let be the vertices in and let be an isomorphism from to . Then is a collection of isomorphisms on all the trees rooted at the children of the roots. Thus, is an isomorphism from to that maps to .
The main technical result can now be introduced:
Let and be rooted trees and let and . There exists an isomorphism mapping to if and only if .
The forward direction follows from Lemma 4.5.
For the backward direction, suppose that . Since a path name includes the canonical name of the entire tree, we deduce that ; it follows that . Similarly, one can deduce that . More generally, let and be paths in and , respectively. Then for all it must be that .
The result will now be proven inductively on the vertices along the path from to . For the base case, take any isomorphism from to and note that this must necessarily map to .
This reasoning can be extended level-by-level upwards, at each stage using the equalities of components of the two path names to extend the isomorphism. Suppose that for there exists an isomorphism from to such that . By equality of path names, we have that . Note also that and are children of and , respectively. Applying Lemma 4.7, there exists an isomorphism from to such that . Since is a vertex in , it follows that . This completes the induction and with it the proof.
Theorem 4.8 also gives a method for identifying when there exists only a single isomorphism between two rooted trees. Suppose that and are isomorphic rooted trees and that every vertex has a distinct path name; then there exists exactly one isomorphism from to . Intuitively, a vertex in can only be mapped to a vertex in with the same path name. So if path names are unique, then each vertex in can only be mapped to a single vertex in , meaning there is only a single isomorphism available. The converse also holds: if there exists exactly one isomorphism from to , then every vertex necessarily has a distinct path name. This observation will be useful in characterizing when query reconstruction results in FQR. We summarize with:
Let and be isomorphic rooted trees. Every vertex has a unique path name in if and only if there exists a single isomorphism from to .
Overview of the QR attack
Our QR attack takes as input the graph , a set of token sequences corresponding to the set of issued queries, and comprises the following steps:
Preprocess the graph offline (Algorithm 3). Compute the SDSP trees of graph . Then construct a multimap such that maps each path name arising in the to the set of SPSP queries whose start vertices have the same path name.
Compute the query trees online. Construct the query trees from the token sequences as the queries are issued.
Process the query trees (Algorithm 4). Compute a dictionary that maps each token sequence to the path name of the start vertex of the path.
Note that steps 4.4. and 4.4. are trivially parallelizable. In the case that the APSP algorithm is randomized, the adversary can simply run the attack multiple times to account for different shortest path trees.
In practice, the attack can output a single large table matching token sequences to sets of queries. However, storing this large table will be more expensive than storing and when has high symmetry. Moreover, can be indexed by the first token in each token sequence (since uniquely determines the sequence).
The following subsections expand upon the steps in the above overview.
Computing the path names
Before diving into the attack, the algorithm for computing path names, which is used as a subroutine of the attack, is described. Algorithm 2 () takes as input a rooted tree and outputs a dictionary mapping each vertex to its path name. First, Algorithm 1 () is called on tree , its root , and an empty dictionary , to obtain a dictionary that maps each vertex to the canonical name of subtree .
A function drawn from a universal hash function family is used to compress the path names from to . An empty dictionary is initialized and updated to include . is then traversed in a depth-first search manner; when a new vertex is discovered during traversal, is set to the hash of the concatenation of the name of and the path name of its parent , that is,
When all vertices have been explored, is returned. The pseudocode for can be found in Algorithm 2.
Let be a rooted tree and be the output of running Algorithm 2 on . Let be a universal hash function family mapping . Then, for randomly sampled , the expected number of collisions in is at most .
Let be distinct and let and be paths in . Thus
and similarly for . For there to be a collision between their path names then either: (1) and collide or (2) for some and , it must be that and collide. Recall that a canonical name is unique up to isomorphism of the rooted tree.
Let denote the event that the path names of and collide, and let denote the event that the th nested hash of ’s and ’s path names collide. A collision on the path names occurs when any of the at most pairs of nested hash values (used to compute the path names of and ) collide. By definition of universal hash function, and thus by linearity of expectation,
Let denote the event of any collision of path names in . Then, by linearity of expectation, the expected number of collisions is
Let be a graph and let be the set of SDSP trees of . Let be the union of the outputs of running Algorithm 2 on each tree in . Let be a universal hash function family mapping . Then for randomly sampled , the expected number of collisions in is at most .
To achieve a smaller probability of collision, one can choose a hash function family whose output length is , where . For simplicity, the universal hash function is invoked using SHA-256 truncated to 128 bits.
Let be a rooted tree on vertices and be a universal hash function family mapping . Upon input of , Algorithm 2 returns a dictionary of size mapping each to a hash of its path name in time .
Correctness follows easily from Theorem 4.10 and by a recursive argument.
Calling (Algorithm 1) takes time. Reading the name of the root and assigning the hash of its name takes at most time . Every node is pushed onto the stack once, and thus the loop on line 12 iterates times. Assigning a new path name on line 16 takes time since is bits, is bits, and computing the hash takes constant time. Pushing the children of a given vertex onto the stack takes time for a total run time of . maps the vertices to the hash of their path names. Each vertex and its hashed path name can be encoded with bits, yielding a dictionary of size .
Preprocess the graph
The original graph is first preprocessed into the SDSP trees. Since the adversary is assumed to have knowledge of , this step can be done offline. The same APSP algorithm used at setup is also used on to compute the SDSP trees , where tree is rooted at vertex . For unweighted, undirected graphs, one can use breadth-first search for a total run time of where . For general weighted graphs, this step has a run time of .16
Next, the path names of each vertex in are computed and a multimap is constructed. maps the (hashed) path name of each vertex in to the set of SPSP queries whose start vertices have the same path name. Theorem 4.8 is leveraged to construct this map as described below.
An empty multimap is initialized. For each , is computed by running Algorithm 2 () on tree . For each vertex in , is computed and the label is looked up in . If the label exists, then . Otherwise . The pseudocode for computing can be found in Algorithm 3.
Let be a graph on vertices. Upon input of , Algorithm 3 returns a multimap of size mapping each to its corresponding path name in time .
For each vertex , a dictionary mapping each vertex in to its respective path names is computed. The correctness of path names follows from Lemma 4.12.
The run time is now analyzed. Computing the all-pairs shortest path takes time . The loop on line 5 iterates through vertices. For each vertex in , running Algorithm 2 () takes time and the inner loop on line 9 takes time. Thus, the loop on line 5 takes a total time of .
The multimap maps hashes of the path names to a list of candidate queries. The hashed path names have size and there are at most distinct path names; each query corresponds to only one path name and is bits long. The multimap thus has a total size .
Process the search tokens
The tokens revealed at query time must now be processed. Recall that the tokens are revealed such that the response to any shortest path query can be computed noninteractively. When a search token is sent to the server, the server recursively looks up each of the encrypted vertices along the path. The adversary can thus compute the query trees using the search tokens revealed at query time. First, it initializes an empty graph .
As label–value pairs are revealed in , the adversary parses and , and adds as a directed edge to . At any given time, will be a forest comprised of trees, , such that each has at most nodes. Identifying the individual trees in the forest can be done in time . The adversary can compute the query trees online, and the final step of the attack can be run on any set of complete query trees. A complete query tree corresponds to the set of all queries to some fixed destination vertex. For ease of explanation, Algorithm 4 () takes as input the set of all complete query trees that have been constructed from the leakage.
Map the token sequences to SPSP queries
In the last step, the set of complete query trees is used as input. The path names of each vertex in the are used to construct a dictionary that maps each token sequence to the path name of the starting vertex of the corresponding path in its respective query tree. An empty dictionary is initialized. For each complete query tree , is computed and added to . The pseudocode for computing can be found in Algorithm 4.
Let be a graph and be an encryption of using the GKT scheme. Let be the query trees constructed from the leakage of queries issued to . Upon input of , Algorithm 3 returns a dictionary mapping each path name to a set of SPSP queries in time . Upon input of and , Algorithm 4 returns a dictionary mapping token sequences to path names in time . Moreover, the outputs and have the property that, for any token sequence corresponding to a path in a query tree and for every query , there exists an isomorphism from to such that and .
The correctness of follows from the correctness of the GKT scheme. Dictionary contains a map of each vertex in to its path name. The correctness of and follows from Lemmas 4.13 and 4.12, respectively.
Let be a pair comprised of a nonroot vertex and a root vertex in a complete query tree , and let be the token sequence corresponding to . Let . By composition of and , it must be that . Applying Theorem 4.8, there is thus an isomorphism from to that maps to and to .
With regard to run time, preprocessing (Algorithm 3) takes time . Computing the path names (Algorithm 2) of trees takes time, which is an upper bound on the run time of the whole attack.
Recover the queries
Once the map between each node (token) in a query tree and its corresponding path name has been computed, the attacker can use and to compute the candidate queries of all queries in the complete query trees. Given and (outputs of Algorithms 3 and 4, respectively) and an observed token matching a query in the query trees for some unknown query, the adversary can find the set of queries consistent with by simply computing .
Full query recovery (FQR)
This section is concluded with a discussion of when FQR is possible. By the correctness of the attack, this is the case for a graph , a set of complete query trees , and associated token sequences when for , , and all , .
A condition for FQR feasibility can also be described in graph-theoretic terms. Recall Corollary 4.9, which states that given two isomorphic rooted trees and , if each vertex in has a unique path name, then there exists only one isomorphism from to . FQR is thus always achievable for any set of complete query trees, when all vertices in the SDSP trees have unique path names. More formally:
Let be a graph and let be the set of SDSP trees of . Suppose every vertex in has a unique path name (and in particular, each has a unique canonical name). Then, FQR can always be achieved on any complete query tree(s). The converse is also true.
By correctness, the attack achieves FQR whenever it is possible. Figure 2 depicts a graph for which FQR is always possible. Indeed, each tree has a unique canonical name, and for all , each vertex in has a unique path name. More generally, let be the family of graphs having one central vertex and any number of paths all of distinct lengths appended to . It is easy to see that the attack achieves FQR for all graphs .
An example graph for which FQR is always possible, no matter which set of SPSP queries is issued. FQR: full query recovery; SPSP: single pair shortest path.
Experiments
The theoretical results are supported by experiments on both real-world datasets and random graphs.
Implementation details
The attack was implemented in Python 3.7.6 and ran on a computing cluster with a Core Intel Xeon Gold 6258R 2.7 GHz Processor (Turbo up to 4 GHz/AVX512 Support), and 384 GB DDR4 2933 MHz error correcting code memory. To generate the leakage, the GES from Ghosh et al.10 was implemented, and the same machine for both the client and server was used. The cryptographic primitives were implemented using the PyCryptodome library version 3.10.141; AES-CBC with a 16B key was used for symmetric encryption, and SHA-256 was used for collision-resistant hash functions SHA-256. For the DES, from Cash et al.40 was implemented, and the tokens were generated using hash-based message authentication code with SHA-256 truncated to 128 bits. The shortest paths of the graphs were computed using the single_source_shortest_path algorithm from the NetworkX library version 2.6.2.42
The QR attack was implemented using the same shortest path algorithm from NetworkX as in the scheme implementation. An implementation of the AHU algorithm (Algorithm 1) was used to compute canonical names. As mentioned previously, the attack is highly parallelizable, and this property was exploited when implementing the attack.
Graph datasets
The attack was evaluated on six of the same datasets as Ghosh et al.10; in addition, the InternetRouting dataset from the University of Oregon Route Views Project (collected on 2 January 2000) and the facebook-combined dataset were also used. All eight of these datasets were obtained from Leskovec and Krevl.43 The InternetRouting and CA-GrQc datasets were extracted from the original datasets using the dense subset extraction algorithm by Charikar44 as implemented by Ambavi et al.45 Details about these datasets can be found in Table 1, and a summary of the attack results can be found in Table 2.
A description of the real-world datasets used in the experimental evaluation; denotes the number of vertices; denotes the number of edges of the graph dataset; denotes the density of the graph.
Dataset
# Comp
InternetRouting
35
323
0.543
1
CA-GrQc
46
1030
0.995
1
Email-Eu-core
1005
16,706
0.0331
20
Facebook-combined
4039
88,234
0.011
1
p2p-Gnutella08
6301
20,777
0.001
2
p2p-Gnutella04
10,876
39,994
0.0006
1
p2p-Gnutella25
22,687
54,705
0.0002
13
p2p-Gnutella30
36,682
88,328
0.0001
12
A comparison of the attack results for the real-world datasets.
The third column denotes the fraction of queries that are recoverable up to one candidate. The fourth column denotes the smallest fraction of queries needed to reconstruct all query trees. The last three columns show the 50th, 90th, and 99th percentiles obtained for QR on the eight real-world datasets.
In addition to the real-world datasets, the attack was also deployed on random graphs for and edge probabilities . The graphs were generated using the fast_gnp_random_graph function from NetworkX.42
Query reconstruction results
Real-world datasets. The attack was carried out on the Internet Routing, CA-GrQc, email-EU-Core, facebook-combined, and p2p-Gnutella08 datasets; the online portion of the attack (Algorithm 4) given all queries ran in 0.087 s, 0.093 s, 5.807 s, 102.670 s, and 339.957 s for each dataset, respectively. For the first four datasets, attacks given 75% and 90% were also run. The results were averaged over 10 runs, and the vertices were sampled as follows: the start vertex was chosen uniformly at random, and the end vertex was chosen with probability linearly proportional to its out degree in the original graph. This simulates a more realistic setting in which certain “highly connected” destinations are chosen with higher frequency. The results of these experiments can be found in Table 3. Queries can be reconstructed with just 75% of the queries. In fact, for the Facebook-combined dataset, complete query trees can be observed with high probability after only observing 20% of the queries.
For the remaining datasets, simulations were run to demonstrate the success that an adversary could achieve given 100% of the queries. The simulations were carried out as follows. Given , the SDSP trees and the path names for each vertex in these trees were computed, and then a dictionary mapping each query in to the set of candidate queries was constructed by identifying queries whose starting vertices have the same path name. The simulations only used the plaintext graph, and the results show the success that an adversary would achieve in an end-to-end attack. Simulations were used for the larger graphs since storing all responses is memory-intensive; in practice, the attack can be run on larger datasets by writing the map out to a back-end key-value store. These results can be found in the bottom row of Table 3.
CDFs for QR of the real-world data sets after observing (row 1) 75%, (row 2) 90%,
and (rows 3 and 4) 100% of the queries.
Table 1 reports the percentage of uniquely recoverable queries when the attack is run on the set of all query trees. Uniquely recoverable queries are queries whose responses result in only one candidate. CA-GrQc had the smallest percentage of uniquely recoverable queries (0.145%) and the p2p-Gnutella04 had the largest percentage (21.911%). The small percentage for CA-GrQc can be attributed to its high density (), where density is defined as . The CA-GrQc graph is nearly complete, and its SDSP trees display a high degree of symmetry. In fact, many of the query trees are isomorphic to the majority of SDSP trees, and the majority of SDSP trees have a star shape (i.e. of the vertices in the tree are adjacent to the root). Each nonroot vertex in a star tree has the same path name, resulting in a large number of possible candidates per token sequence.
Table 3 depicts the cumulative distribution functions (CDFs) resulting from the experiments. The four Gnutella datasets exhibit a high recovery rate that can be explained by asymmetry and low density. Fifty percent of all queries for the p2p-Gnutella08, p2p-Gnutella04, p2p-Gnutella25, and p2p-Gnutella30 datasets result in at most 4, 3, 5, and 5 candidate query values, respectively. Details of the 50th, 90th, and 99th percentiles can be found in Table 1. The histograms of the results for QR on the real-world datasets (assuming 100% of the queries have been observed) are depicted in Table 4.
Random graphs. The attack was also deployed on random graphs, varying the number of nodes () and the edge probability (). QR was carried out after all queries had been issued. For each pair, 50 random graphs were generated and encrypted using the GKT scheme. The leakage for all possible SPSP queries was then generated, and the server-side QR attack was deployed on the responses. For each recovered multimap, the average number of candidate queries across all 50 graphs was computed. The CDFs of these results can be found in Table 5. The attack executed very quickly, with runtimes ranging from 0.471 s on graphs with to s on graphs with .
Histograms for quick recovery (QR) of the real-world datasets after observing 100% of the queries.
The number of candidate queries output by QR is plotted on the axis, and the number of queries is plotted on the -axis. The red dotted lines indicate the 50th, 90th, and 99th percentiles. An asterisk next to the dataset indicates that results were obtained via simulation; see the discussion for details.
CDFs for QR of random graphs for and after observing 100% of the queries.
CDF: cumulative distribution function; QR: query recovery.
The number of candidate queries output by the QR attack was plotted on the -axis, and the percentage of total queries was plotted on the axis. For each , 50 graphs were generated, and an average of the number of vertices with each given set size of candidate queries was taken. As the edge probability increases, the number of symmetries, and hence the number of candidate queries output, tends to increase.
PDFs for QR of random graphs after observing 100% of the queries.
PDF: probability density function; QR: query recovery.
The number of candidate queries output by the QR attack is plotted on the -axis, and the percentage of total queries is plotted on the -axis.
In general, an increase in and both result in an increase in the size of the maximum query candidate sets. For example, for , of all queries are uniquely recoverable and of all queries are recoverable to at most 10 candidate queries (representing of all queries). For , of all queries are uniquely recoverable and of all queries are recoverable to at most 107 candidate queries (representing of all queries).
As increases from to , the graphs become more dense, and a similar trend as seen in the real-world datasets can be observed. Denser graphs are closer to complete, and result in more symmetries and larger candidate query sets. As increases, more “waves” in the CDFs can also be observed; in the graphs showing the probability density functions (Table 6), these correspond to large clusters of candidate queries, all of which have the same path name and hence cannot be distinguished.
Discussion
This work describes a QR attack against the GKT GES from Ghosh et al.10 The attack model considered is strong, but fits within the model used in Ghosh et al.10 The attack begins with an offline preprocessing phase of the graph. In the online phase, the attack waits until it has observed all queries to at least one destination vertex and then outputs a list of candidates for each of these queries. The attack has the property that the output contains everything consistent with the leakage (and nothing more), and always contains the correct query. The attack was supported with a precise characterization of when FQR is possible, and evaluated against real-world and random graphs.
An alternative setting is to consider query reconstruction when arbitrary subsets of queries have been issued: then the adversary can construct partial query trees and attempt to identify isomorphic embeddings of them into the SDSP trees. It is an interesting open problem to develop an efficient attack for this setting. Yet another variant of the attack, for a network adversary, is described in Appendix A.
This paper highlights the need for detailed cryptanalysis of GESs. The value of such analysis was recognized in Ghosh et al.,10 but omitted on the grounds that the impact of the leakage is application-specific and can only be assessed in the context of particular use cases at the time of deployment. Such analysis, however, should be done in tandem with security proofs (establishing leakage profiles) at the same time as schemes are developed. Of course, attacks should be assessed with respect to real-world datasets whenever possible, as done here.
This work leaves open the question of whether other GESs can be similarly attacked. On the constructive side, the question of whether more secure schemes can be built that utilize chaining in a noninteractive manner and which support shortest path queries remains open. Another interesting line of research includes constructing practical interactive graph database schemes that minimize the communication overhead and number of rounds between the client and the server. Moreover, there is still much work to be done regarding the design of encrypted graph database schemes that can support a variety of queries—an important property for schemes in practical settings.
Footnotes
Acknowledgments
Work by F.F. was performed in part while visiting Brown University.
ORCID iDs
Francesca Falzon
Kenneth G. Paterson
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported in part by the ThinkSwiss Research Scholarship, the U.S. National Science Foundation, and Armasuisse Science and Technology.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Attack for a network adversary
This section introduces a novel variant of the QR attack from the perspective of a network adversary. In practice, a secure communication protocol such as Transport Layer Security would be used to encrypt the communication between the client and the server, thereby mitigating such an attack. However, this attack is an interesting proof of concept that demonstrates how a slight change in the leakage profile can greatly impact the adversary’s ability to recover information.
The network adversary is assumed to know G, but is able to only observe the communication between the client and server, that is, the initial search token and its response (which is a sequence of the encrypted vertices in the path, not including the start vertex). This is in contrast to a server-side adversary that is able to observe the complete sequence of search tokens and encrypted vertices during look-up. The network-adversary attack can be summarized as follows:
Preprocess the graph offline (Algorithm 3). Compute the SDSP trees {Tv}v∈V of graph G. Then construct a multimap M such that M maps each path name arising in the Tv to the set of SPSP queries whose start vertices have the same path name.
Compute the query trees online. Construct the query trees from the sequence of ciphertexts contained in the responses. The nodes in the trees are labeled by the issued search tokens.
Process the query trees (Algorithm 5). Construct a dictionary D such that D maps each node in the query tree to the path name of the start vertex of the path. Additionally, compute a multimap M^ that maps search tokens to the possible nodes in the query tree that each token could correspond to.
Note that Step (A.) is the same as before, but Steps (A.) and (A.) must be adapted to this new setting. The latter two steps are described in detail below.
BronsonNAmsdenZCabreraG, et al.TAO: Facebook’s distributed data store for the social graph. In: 2013 USENIX annual technical conference (USENIX ATC 13). San Jose, CA: USENIX Association, 2013, pp.49–60.
MalewiczGAusternMHBikAJC, et al.Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, Indianapolis, IN, USA, 2010, pp.135–146. New York, NY, USA: Association for Computing Machinery.
6.
LowYBicksonDGonzalezJ, et al.Distributed graphLab: A framework for machine learning and data mining in the cloud. Proc VLDB Endow2012; 5: 716–727.
7.
ShaoBWangHLiY. Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, SIGMOD’13, 2013, pp.505–516. New York, NY, USA: Association for Computing Machinery.
8.
ChaseMKamaraS. Structured encryption and controlled disclosure. In: Advances in cryptology – ASIACRYPT 2010 – 16th international conference on the theory and application of cryptology and information security, 2010, pp.577–594, Lecture notes in computer science, Vol. 6477. Singapore: Springer, Cham, Switzerland.
9.
MengXKamaraSNissimK, et al.GRECS: graph encryption for approximate shortest distance queries. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, Denver, Colorado, 2015, pp.504–517. New York, NY, USA: Association for Computing Machinery.
10.
GhoshEKamaraSTamassiaR. Efficient graph encryption scheme for shortest path queries. In: Proceedings of the 2021 ACM Asia conference on computer and communications security, ASIA CCS’21, Virtual, Hong Kong, 2021, pp.516–525. New York, NY, USA: Association for Computing Machinery.
11.
WangQRenKDuM, et al.SecGDB: graph encryption for exact shortest distance queries with efficient updates. In: Financial cryptography and data security – 21st international conference, FC 2017, Sliema, Malta, April 3–7, 2017, revised selected papers (ed A Kiayias), Lecture notes in computer science, Vol. 10322, 2017, pp.79–97. Cham, Switzerland: Springer.
12.
FalzonFPatersonKG. An efficient query recovery attack against a graph encryption scheme. In: Computer security – ESORICS 2022 – 27th European symposium on research in computer security, Copenhagen, Denmark, September 26–30, 2022, proceedings, part I (eds V Atluri, RD Pietro, CD Jensen and W Meng), lecture notes in computer science, Vol. 13554, 2022, pp.325–345. Berlin: Springer.
13.
SealfonA. Shortest paths and distances with differential privacy. In: Proceedings of the 35th ACM SIGMOD–SIGACT–SIGAI symposium on principles of database systems, PODS’16. New York, NY, USA: Association for Computing Machinery, 2016, pp.29–41.
14.
MouratidisKYiuML. Shortest path computation with no information leakage. Proc VLDB Endow2012; 5: 692–703.
15.
AhoAVHopcroftJEUllmanJ. Data structures and algorithms, 1st edn. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1983.
PohGSMohamadMSZ’abaMR. Structured encryption for conceptual graphs. In: Hanaoka G and Yamauchi T (eds) Advances in information and computer security. Berlin: Springer, 2012, pp.105–122.
LaiSYuanXSunS-F, et al.GraphSE2: an encrypted graph database for privacy-preserving social search. In: Proceedings of the 2019 ACM Asia conference on computer and communications security, Asia CCS’19. New York, NY, USA: Association for Computing Machinery, 2019, pp.41–54.
20.
SalaAZhaoXWilsonC, et al.Sharing graphs using differentially private graph models. In: Proceedings of the 2011 ACM SIGCOMM conference on internet measurement conference, IMC’11. New York, NY, USA: Association for Computing Machinery, 2011, pp.81–98.
21.
BlackstoneLKamaraSMoatazT. Revisiting leakage abuse attacks. In: 27th annual network and distributed system security symposium, NDSS 2020, San Diego, California, USA, February 23–26, 2020. San Diego, CA, USA: The Internet Society, 2020.
22.
CashDGrubbsPPerryJ, et al.Leakage-abuse attacks against searchable encryption. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security (CCS '15), Denver, Colorado, pp.668–679. New York, NY, USA: Association for Computing Machinery.
23.
ZhangYKatzJPapamanthouC. All your queries are belong to us: the power of file-injection attacks on searchable encryption. In: 25th USENIX security symposium (USENIX Security 16). Austin, TX: USENIX Association, 2016, pp.707–720.
24.
IslamMSKuzuMKantarciogluM. Access pattern disclosure on searchable encryption: ramification, attack and mitigation. In: 19th annual network and distributed system security symposium, NDSS 2012. San Diego, CA, USA: The Internet Society, 2012.
25.
PouliotDWrightCV. The shadow nemesis: inference attacks on efficiently deployable, efficiently searchable encryption. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security (CCS '16), Vienna, Austria, 2016, pp.1341–1352. New York, NY, USA: Association for Computing Machinery.
26.
GuiZPatersonKGPatranabisS. Rethinking searchable symmetric encryption. In: 44th IEEE symposium on security and privacy, SP 2023, San Francisco, CA, USA, May 21–25, 2023, pp.1401–1418. NY, USA: The Institute of Electrical and Electronics Engineers (IEEE).
27.
KellarisGKolliosGNissimK, et al.Generic attacks on secure outsourced databases. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security (CCS '16), Vienna, Austria, 2016, pp.1329–1340. New York, NY, USA: Association for Computing Machinery.
28.
LacharitéM-SMinaudBPatersonKG. Improved reconstruction attacks on encrypted data using range query leakage. In: IEEE symposium on security and privacy (SP), San Francisco, CA, USA, 2018, pp.297–314. NY, USA:The Institute of Electrical and Electronics Engineers (IEEE).
29.
GrubbsPLacharitéMMinaudB, et al.Pump up the volume: practical database reconstruction from volume leakage on range queries. In: Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, CCS 2018, Toronto, ON, Canada, October 15–19, 2018 (eds D Lie, M Mannan, M Backes and X Wang), 2018, pp.315–331. New York, NY, USA: Association for Computing Machinery.
30.
GrubbsPLacharitéM-SMinaudB, et al.Learning to reconstruct: statistical learning theory and encrypted database attacks. In: Proceedings of IEEE symposium on security and privacy (SP), San Francisco, CA, USA, 2019, pp.1067–1083. NY, USA: Institute of Electrical and Electronics Engineers.
31.
GuiZJohnsonOWarinschiB. Encrypted databases: new volume attacks against range queries. In: Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, CCS 2019, London, UK, November 11–15, 2019 (eds L Cavallaro, J Kinder, X Wang and J Katz), 2019, pp.361–378. New York, NY, USA: Association for Computing Machinery.
32.
KornaropoulosEMPapamanthouCTamassiaR. The state of the uniform: attacks on encrypted databases beyond the uniform query distribution. In: Proceedings of IEEE symposium on security and privacy (SP), San Francisco, CA, USA, 2018, pp.297–314. NY, USA: Institute of Electrical and Electronics Engineers, 2020.
33.
KornaropoulosEMPapamanthouCTamassiaR. Response-hiding encrypted ranges: revisiting security via parametrized leakage-abuse attacks. In: Proceedings of IEEE symposium on security and privacy, San Francisco, CA, USA, 2021, pp.1502–1519. NY, USA: Institute of Electrical and Electronics Engineers.
34.
MarkatouEATamassiaR. Full database reconstruction with access and search pattern leakage. In: Information security – 22nd international conference, ISC 2019, New York City, NY, USA, September 16–18, 2019, proceedings, lecture notes in computer science, Vol. 11723, 2019, pp.25–43. Cham, Switzerland: Springer.
35.
FalzonFMarkatouEACashD, et al.Full database reconstruction in two dimensions. In: Proceedings of the 2020 ACM SIGSAC conference on computer and communications security (CCS '20), Virtual, 2020, pp.443–460. New York, NY, USA: Association for Computing Machinery.
36.
MarkatouEAFalzonFTamassiaR, et al. Reconstructing with less: leakage abuse attacks in two dimensions. In: Proceedings of the 2021 ACM SIGSAC conference on computer and communications security (CCS '21), Virtual, 2021, pp.2243–2261. New York, NY, USA: Association for Computing Machinery.
37.
KornaropoulosEMPapamanthouCTamassiaR. Data recovery on encrypted databases with -nearest neighbor query leakage. In: Proceedings of IEEE symposium on security and privacy 2019, (S&P 2019), San Francisco, CA, USA, 2019, pp.1033–1050. NY, USA: Institute of Electrical and Electronics Engineers (IEEE).
38.
GoetschmannA. Design and analysis of graph encryption schemes. Master’s Thesis, ETH Zürich, 2020.
39.
CormenTHLeisersonCERivestRL, et al.Introduction to algorithms, 3rd edn. The MIT Press, 2009.
40.
CashDJaegerJJareckiS, et al.Dynamic searchable encryption in very-large databases: data structures and implementation. In: 21st annual network and distributed system security symposium 2014, NDSS 2014. San Diego, CA, USA: The Internet Society, 2014.
LeskovecJKrevlA. SNAP datasets: Stanford large network dataset collection, 2014.
44.
CharikarM. Greedy approximation algorithms for finding dense components in a graph. In: Jansen K and S Khuller S (eds) Approximation algorithms for combinatorial optimization. Berlin: Springer, 2000, pp.84–95.