Perturbation-Based Schemes with Ultra-Lightweight Computation to Protect User Privacy in Smart Grid

Abstract

In smart grid, smart meters are deployed to collect power consumption data periodically, and the data are analyzed to improve the efficiency of power transmission and distribution. The collected consumption data may leak the usage patterns of domestic appliances, so that it may damage the behavior privacy of customers. Most related work to protect data privacy in smart grid relies on cryptographic primitives, for example, encryption, which induces a large amount of power consumption overhead. In this paper, we make the first attempt to propose solutions without any cryptographic computation to protect user privacy. The privacy in smart grid is formally defined in the paper. Three schemes are proposed: random perturbation scheme (RPS), random walk scheme (RWS), and distance-bounded random walk with perturbation scheme (DBS). Three algorithms are also proposed in each scheme, respectively. All schemes are ultra-lightweight in terms of computation without relying any cryptographic primitive. The privacy, soundness, and accuracy of proposed schemes are guaranteed and justified by strict analysis.

1. Introduction

Smart grid is a typical application of Internet of Things, M2M, or IP-based sensor networks. It has been envisioned as a key method to reduce the emission of carbon dioxide and retard climate changes, by improving the efficiency of power distribution and transmission.

Smart grid relies on smart meters to collect power consumption data at user ends instantly. Smart meters report the power consumption data periodically to smart grid control center (SGCC). SGCC thus can allocate necessary power distribution and schedule required power transmission. In addition, the SGCC can relocate the power requirements at user ends by delivering power price to users. Users thus can schedule the usage of their household appliances according to the forthcoming price.

As smart meters report the power consumption data periodically, the data may leak user privacy in daily life. For example, the data may be used for deducing user behavior patterns, such as when she gets up according to the data of using microwave oven or toaster in the morning, when she goes back home according to the data of using electric stove for cooking at afternoon, or when she takes bath or goes to bed at night according to the data of using water heater or lamps. Such privacy concerns have already been acknowledged and reported by NIST [1] and significantly affect the deployment of smart meters.

Although there exist several privacy protection or security improvement for smart grid currently [2–6], most of them rely on cryptographic primitives, for example, encrypting the uploading data at smart meters. Cryptographic operations are usually not lightweight, so that they will induce extra power consumption at smart meters. In addition, the data uploading may occur frequently and periodically, so the computation for data encryption occurs extensively. For example, data are uploaded to SGCC once in 10 minutes. The encryption for the data has to be 144 times a day. Thus, the energy consumption for encryption computation would be large for a month even at single smart meter. Moreover, the extra power consumption will be accumulated to an unsatisfactory waste, because the number of smart meters in smart grid is huge. Furthermore, the decryption computation at SGCC has to be conducted if the uploading data are encrypted at smart meters. The energy consumption of decryption at SGCC will thus extremely increase. Last but not least, the smart meters usually have resource and power constraints, like traditional sensors. As the privacy protection must be conducted at smart meters, any computation for privacy protection should cost low energy to tackle these constraints. The frequent encryption operations are undesirable. Even though the encryption is lightweight in certain situations, the key management for encryption is also a difficult issue for deployment. Therefore, privacy protection by encryption unfortunately contradicts the intention of smart grid for saving energy; an ultra-lightweight method without any cryptographic computation for privacy protection is mandatory for a long run and a large scale.

In this paper, we propose perturbation-based schemes with ultra-lightweight computation without any cryptographic computation. Besides, we strictly and formally define and proof its privacy protection strength. We adapt a rigorous method to state, present, and analyze the privacy protection achievements. All our presentations strictly follow the formal expressions for better clarity and generality.

The contributions of the paper are listed as follows: (i) we propose ultra-lightweight privacy-protection schemes in terms of computation (and thus energy consumption) without any cryptographic computation; (ii) we strictly define the requirements on privacy, soundness, and accuracy in smart grid and proof the guarantee of those requirements.

The rest of the paper is organized as follows. In Section 2 we discuss the basic assumption and models used throughout the paper. Section 3 provides the detailed description of our proposed models and analysis. Section 4 gives an overview on relevant prior work. Finally, Section 5 concludes the paper.

2. Problem Formulation

2.1. Network Model

Two major entities exist in smart grid: smart meter (denoted by SM hereafter) and SGCC.

SM computes power consumption data and uploads them to SGCC periodically. The period for computing power consumption data at SM is called sensing period. The period for uploading power consumption data to SGCC is called uploading period. Without loss of generality, suppose the sensing period and uploading period are both t minutes. The sensing times and uploading times in a day will thus be $n = [24 * 60 / t]$ . The total sensing data for a day are denoted as a set $D A T A_{s} = {d s_{1}, \dots, d s_{n}}$ . The total uploading data for a day are denoted as a set $D A T A_{u} = {d u_{1}, \dots, d u_{n}}$ . If SM does not hide $D A T A_{s}$ , $D A T A_{s}$ will be the same as $D A T A_{u}$ .

In smart grid, utility price may vary in different time slots. The price information is delivered by SGCC in advance. Users use such information to guide the power consumption. SM receives such information to calculate utility charge in a month for users. Suppose the prices for n uploading periods in a day are denoted as a set $P R I C E = {p_{1}, p_{2}, \dots, p_{n}}$ . Thus, the total utility charge for a day is $\sum_{i = 1}^{n} ‍ d u_{i} * p_{i}$ . The total utility charge for a month is the summation of charges for all days in this month. If the sensing data are changed into the uploading data for protecting privacy, the total utility charge for a day should be remained correct.

2.2. Attack Model and Trust Model

Only adversaries who attack user privacy are considered in this paper. Adversaries can eavesdrop the channels between SM and SGCC; those are denoted as $𝒜_{c}$ . Adversaries at SGCC can access all uploading data by SM; those are denoted as $𝒜_{s}$ . Both adversaries desire to deduce the user behaviors in a day by analyzing the uploading data from SM, namely, $D A T A_{u}$ . As $𝒜_{c}$ and $𝒜_{s}$ have the same view on $D A T A_{u}$ , we further do not distinguish those two adversaries. Both are denoted by the same notation 𝒜.

SGCC is untrustworthy, as we assume adversaries at SGCC are interested in user privacy. SM should be trustworthy. It is a prerequisite for any further discussion, sensing data are at SM, and all possible solutions are conducted at SM. Besides, if SM is untrustworthy, users will not choose them. SM can be easily evaluated and authorized by a Trusted Third Party (TTP).

2.3. Security Definition and Design Goal

Informally speaking, the privacy is guaranteed if the adversaries (not only at SGCC but also at channels between SGCC and SM) cannot deduce the user activities in a day. More specifically, we formally state the privacy requirement definition as follows.

Definition 1.

User activities. They are the activities that damage user privacy and are related to using one or multiple household appliances in a daily life. They are denoted as a set $A C T = {a_{1}, a_{2}, \dots, a_{m}}$ , where $a_{i} (i = 1, \dots, m)$ is an activity related to one or multiple appliances.

Definition 2.

Deduce. It means an activity in $A C T$ can be inferred by data in $D A T A_{b}$ . If an activity $a_{j} \in A C T$ is inferred by data $d_{i} \in D A T A_{b}$ , it is denoted as a relation $(d_{i}, a_{j}) \in R = D A T A_{b} \times A C T$ , where R is a deduction relationship set and defined previously and empirically; $D A T A_{b}$ is the set of “bad” data that can infer to at least one in $A C T$ .

Definition 3.

Perfect full privacy (denoted as $Privac y_{full}^{p}$ ). Simply speaking, any adversary 𝒜 cannot deduce from anyone in $D A T A_{u}$ to one in $A C T$ after viewing $D A T A_{u}$ . More specifically, given anyone $d u_{i} \in D A T A_{u}$ , it is impossible for any adversary 𝒜 to find $a_{j} \in A C T$ , such that $(d u_{i}, a_{j}) \in R$ . That is,

\begin{array}{l} Pr {\forall d u_{i} \in D A T A_{u}, a_{j} ⟸ A C T, s . t . (d u_{i}, a_{j}) \in R : D A T A_{u}} \\ = 0, \end{array}

(1)

where

Pr {A : B}

denotes after viewing “B”; the probability of event “A” happens; “⇐” means “is selected from”; “,” means two operations happen consequently; “

s . t

.” is a shorthand for “such that.”

Definition 4.

Computational full privacy (denoted as $Privac y_{full}^{c}$ ). Given anyone $d u_{i} \in D A T A_{u}$ , it is computationally infeasible for any Probabilistic Polynomial Turing Machine (PPTM) adversary 𝒜 to find $a_{j} \in A C T$ , such that $(d u_{i}, a_{j}) \in R$ . That is,

\begin{array}{l} Pr {\forall d u_{i} \in D A T A_{u}, a_{j} ⟸ A C T, s . t . (d u_{i}, a_{j}) \in R : D A T A_{u}} \\ < negl (z), \end{array}

(2)

where

negl (z)

is a negligible function with security parameter z.

Claim 1.

Perfect (computational) full privacy can protect user privacy on all user activities in a day, as no activity can be deduced from data in $D A T A_{u}$ by any (PPTM) adversary.

In previous claim the content in “()” is corresponded with each other. Similarly, the perfect (computational) partial privacy can be defined in the following.

Definition 5.

Perfect (computational) partial privacy, denoted as $Privac y_{partial}^{p (c)}$ . Given at least one $d u_{i} \in D A T A_{u}$ , it is computationally infeasible for any (PPTM) adversary 𝒜 to find $a_{j} \in A C T$ ; such that $(d u_{i}, a_{j}) \in R$ after viewing $D A T A_{u}$ . Besides, given at least one $a_{j} \in A C T$ , it is computationally infeasible for any (PPTM) adversary 𝒜 to find $d u_{i} \in D A T A_{u}$ , such that $(d u_{i}, a_{j}) \in R$ after viewing $D A T A_{u}$ . That is,

\begin{array}{l} Pr {\exists d u_{i} \in D A T A_{u}, a_{j} ⟸ A C T, s . t . (d u_{i}, a_{j}) \in R : D A T A_{u}} \\ = 0 (< n e g l (z)), \\ Pr {\exists a_{j} \in A C T, d u_{i} ⟸ D A T A_{u}, s . t . (d u_{i}, a_{j}) \in R : D A T A_{u}} \\ = 0 (< n e g l (z)) . \end{array}

(3)

Claim 2.

Perfect (computational) partial privacy can protect certain privacy-sensitive activities, as these activities cannot be deduced by $D A T A_{u}$ by any (PPTM) adversary.

Claim 3.

Full privacy has stronger strength than partial privacy in terms of the number of deducible data in $D A T A_{u}$ . Perfect privacy has stronger strength than computational privacy due to the adversary's ability. That is,

\begin{matrix} P r i v a c y_{partical}^{c} < P r i v a c y_{partical}^{p} < P r i v a c y_{full}^{c} < P r i v a c y_{full}^{p}, \end{matrix}

(4)

where “

A < B

” means that the privacy protection strength of “A” is weaker than that of “B”.

Roughly speaking, full privacy protects all activities; partial privacy protects partial activities. Perfect privacy defends against any adversary; computational privacy defends against any PPTM adversary. As perfect full privacy has the strongest privacy strength, we thus concentrate on the perfect full privacy protection in the following.

Definition 6.

Full privacy attacking experiment on the scheme Π defending against any adversary 𝒜- $ExpPrivac y_{full}^{p, 𝒜, Π}$ is defined as follows: (1)

the scheme Π is executed in the presence of any adversary 𝒜;

(2)

𝒜 fully accesses $D A T A_{u}$ , $A C T$ , and R. Given any $d u_{i} \in D A T A_{u}$ , if 𝒜 can find $a_{j} \in A C T$ , such that $(d u_{i}, a_{j}) \in R$ , 𝒜 outputs 1, otherwise, outputs 0;

(3)

if and only if 𝒜 outputs 1, the experiment outputs 1.

Definition 7.

The scheme Π that can guarantee the perfect full privacy in presence of any adversary 𝒜 (denoted as $Privac y_{full}^{p, 𝒜, Π} = 1$ ) is defined as follows.

For any adversary 𝒜 that the scheme Π defends against, the probability that the output of the full privacy attacking experiment equals one is 0. That is, if and only if

\begin{matrix} Pr [E x p P r i v a c y_{full}^{p, 𝒜, Π} = 1] = 0, \end{matrix}

(5)

Privac y_{full}^{p, 𝒜, Π} = 1

Therefore, the design goal is to propose a scheme Π satisfying $Privac y_{full}^{p, 𝒜, Π}$ and importantly, with ultra-lightweight computation without any cryptographic computation.

3. Proposed Schemes

3.1. Problem Reduction

To protect the privacy of sensing data $D A T A_{s}$ , a naive method is encrypting them at SM and then uploading them to SGCC. As SGCC is untrustworthy, SGCC cannot decrypt them and has to consult a TTP. The TTP decrypts the data, and the result cannot be sent to SGCC. The TTP should compute accumulative values (or metadata) and send them to SGCC for further scheduling and charging. It obviously arises multiple overheads: a large volume of computation overhead at SM; extra communication overhead at SM and SGCC; extra entity TTP; key management overhead between SM and TTP.

As SM is trustworthy, SM is proposed to equip a trusted mixing layer between sensing layer and communication layer. That is, SM is modeled as three tuples: $〈 L_{s}, L_{m}, L_{c} 〉$ , where $L_{s}$ is a sensing layer computing the power consumption periodically. The output of layer $L_{s}$ is $D A T A_{s}$ ; $L_{m}$ is a mixing layer that transfers $D A T A_{s}$ into $D A T A_{u}$ ; $L_{c}$ is a communication layer that uploads $D A T A_{u}$ to SGCC. That is,

\begin{matrix} SM : : = 〈 L_{s}, L_{m}, L_{c} 〉, \\ L_{s} \Rightarrow L_{m} : D A T A_{s}, \\ L_{m} : : = F : D A T A_{s} \to D A T A_{u}, \\ L_{m} \Rightarrow L_{u} : D A T A_{u}, \end{matrix}

(6)

where “

: : =

” means “is defined as”; “⇒” means “data transferring between layers”; F is a data transforming function; “→” that means the input of the function F is transformed into the output of the function F. Therefore, it becomes the concentration to search an ultra-lightweight transformation function F with

Privac y_{full}^{p, 𝒜, F} = 1

in the rest of the paper.

Definition 8.

“Bad” data set ( $D A T A_{b})$ . It consists of all power consumption data that can deduce to one or multiple activities in $A C T$ . $D A T A_{b} = {d b_{1}, d b_{2}, \dots, d b_{o}}$ , where o is the total number of $d b_{i} \in D A T A_{b} (i = 1, \dots, o)$ .

The characteristics of $D A T A_{b}$ , $A C T$ , and deduction relationship set R are as the following. (1)

Without loss of generality, $D A T A_{b}$ is a sorted set of positive numbers. That is, $d b_{1} < d b_{2} < \dots < d b_{o}$ . $d b_{1}$ is equal to or greater than the power consumption of the minimum power consumption appliance in a period. $d b_{o}$ is equal to or less than the power consumption of all appliances in a period.

(2)

Any $d b_{i} \in D A T A_{b} (i = 1, \dots, o)$ may represent the usage of one appliance in a period. For example, $d b_{1}$ (30 wh) is the power consumption of a lamp for a period. $d b_{1}$ is related to an event (e.g., $a_{1}$ ) that means the lamp is on in the period.

(3)

Any $d b_{i} \in D A T A_{b} (i = 1, \dots, o)$ may also represent the usage of multiple household appliances. For example, $d b_{9}$ represents two household appliances used simultaneously. $d b_{9} = d b_{1} + d b_{2}$ , where $d b_{1}$ is the power consumption of the lamp in a period; $d b_{2}$ is the power consumption of the washing machine in the period. Thus, $d b_{9}$ means using lamp and washing machine simultaneously in the period.

(4)

Similarly, any $a_{j} \in A C T (j = 1, \dots, m)$ may represent the usage of one appliance or multiple household appliances simultaneously.

(5)

Any $d b_{i} \in D A T A_{b} (i = 1, \dots, o)$ is related to at least one $a_{j} \in A C T, (j = 1, \dots, m)$ ; any $a_{j} \in A C T (j = 1, \dots, m)$ is related to at least one in $d b_{i} \in D A T A_{b} (i = 1, \dots, o)$ .

(6)

Different $d b_{i} \in D A T A_{b} (i = 1, \dots, o)$ cannot be related to the same $a_{j} \in A C T, (j = 1, \dots, m)$ , as any $a_{j} \in A C T (j = 1, \dots, m)$ has single power consumption in a period.

(7)

$d b_{i} \in D A T A_{b} (i = 1, \dots, o)$ may be related to multiple $a_{j}$ , because such $d b_{i}$ may be the power consumption for multiple appliances, and those appliances may have the same power consumption in total. For example, $d b_{9} = d b_{1} + d b_{2} = d b_{3} + d b_{4}$ . $d b_{9}$ is related to $a_{5}, a_{6} \in A C T$ , where $a_{5}$ means using lamp and washing machine simultaneously and $a_{6}$ means the usage of the other two appliances.

In summary, the deduction relationship set $R = D A T A_{b} \times A C T$ can be further refined from a general relationship set to a relationship set with following properties:

\begin{array}{l} Pr {\forall d b_{i} \in D A T A_{b}, \exists a_{j} \in A C T, s . t . (d b_{i}, a_{j}) \in R} = 1, \\ Pr {\forall a_{j} \in A C T, \exists d b_{i} \in D A T A_{b}, s . t . (d b_{i}, a_{j}) \in R} = 1, \\ Pr {\exists d b_{i} \in D A T A_{b}, a_{j 1} \in A C T, a_{j 2} \in A C T, \\ s . t . (d b_{i}, a_{j 1}) \in R, (d b_{i}, a_{j 2}) \in R} > 0, \\ Pr {\exists d_{i 1} \in D A T A_{b}, d_{i 2} \in D A T A_{b}, a_{j} \in A C T, \\ s . t . (d_{i 1}, a_{j}) \in R, (d_{i 2}, a_{j}) \in R} = 0 . \end{array}

(7)

In other words, mapping $D A T A_{b} \to A C T$ is not a function, and mapping $A C T \to D A T A_{b}$ is a surjective and not a injective function.

Definition 9.

After transformation F, the privacy of $D A T A_{u}$ is guaranteed (denoted as $Privac y_{D A T A_{u}}^{F} = 1$ ). $Privac y_{D A T A_{u}}^{F} = 1$ , if

\begin{matrix} \forall d s_{i} \in D A T A_{s}, d u_{i} = F (d s_{i}) \notin D A T A_{b}, \end{matrix}

(8)

where

F : D A T A_{s} \to D A T A_{u}

Definition 10.

After transformation F, the soundness of $D A T A_{u}$ is guaranteed ( $Soundnes s_{D A T A_{u}}^{F} = 1$ ). The utility summation remains unchanged. That is,

\begin{matrix} \sum_{i = 1}^{n} ‍ d s_{i} * p_{i} = \sum_{i = 1}^{n} ‍ d u_{i} * p_{i} . \end{matrix}

(9)

Due to the concentration in the rest of the paper, the research problem is reduced to as follows: given $D A T A_{s}$ , find an ultra-lightweight transformation $F : D A T A_{s} \to D A T A_{u}$ , such that the privacy and soundness of $D A T A_{u}$ are both guaranteed. That is, given $D A T A_{s}$ , find $F : D A T A_{s} \to D A T A_{u}$ , s.t. $Privac y_{D A T A_{u}}^{F} = 1$ and $Soundnes s_{D A T A_{u}}^{F} = 1$ .

Next, we propose a family of schemes to solve the problem. We list all major notations used in the remainder of the paper in Table 1.

Table 1

Notation.

𝒜	Adversary
${D A T A}_{s} = {d s_{1}, \dots, d s_{n}}$	Sensing power consumption data set
${D A T A}_{u} = {d u_{1}, \dots, d u_{n}}$	Uploading power consumption data set
F	Transforming function from ${D A T A}_{s}$ to ${D A T A}_{u}$
$D A T A_{b} = {d b_{1}, \dots, d b_{o}}$	“Bad” power consumption data set
$P R I C E = {p_{1}, p_{2}, \dots, p_{n}}$	Price set
$A C T = {a_{1}, a_{2}, \dots, a_{m}}$	Activity set

3.2. Random Perturbation Scheme (RPS)

We firstly propose a basic scheme-random perturbation scheme (RPS) to illustrate our motivations. In RPS, any $d s_{i} \in D A T A_{s}$ is perturbed into a new value in the middle of $d b_{j}$ and $d b_{j - 1}$ or in the middle of $d b_{j}$ and $d b_{j + 1}$ . The two cases are selected randomly. A Random Perturbation Algorithm called RPA is proposed for transformation F as follows.

3.2.1. Analysis of Algorithm 1

Algorithm 1: Random Perturbation Algorithm—RPA.

Required: ${D A T A}_{s}$ , ${D A T A}_{b}$

Ensure: ${D A T A}_{u}$ ,Privacy $_{{D A T A}_{u}}^{RPA}$ =1,Soundness $_{{D A T A}_{u}}^{RPA}$ = 1.

$d s_{i} \Leftarrow G e t D a t a ({D A T A}_{s})$ //Get a data from ${D A T A}_{s}$ .

for $i = 1$ to $n - 1$ do

if (( $d s_{i} = = d b_{1})$ .OR. $(d b_{1} < d s_{i} < d b_{2}$ )) then

$d u_{i} \Leftarrow (d b_{1} + d b_{2}) / 2$

$δ \Leftarrow d s_{i} - d u_{i}$

$B I A S \Leftarrow B I A S + δ * p_{i}$

end if

if (( $d s_{i} = = d b_{o})$ .OR. $(d b_{o - 1} < d s_{i} < d b_{o}$ )) then

$d u_{i} \Leftarrow (d b_{o - 1} + d b_{o}) / 2$

$δ \Leftarrow d s_{i} - d u_{i}$

$B I A S \Leftarrow B I A S + δ * p_{i}$

end if

for $j = 2$ to $o - 1$ do

if ( $d s_{i} = = d b_{j}$ ) then

$d u_{i} \Leftarrow i f f (r a n d o m () % 2, (d b_{j} + d b_{j + 1}) / 2, (d b_{j} + d b_{j - 1}) / 2)$

$δ \Leftarrow d s_{i} - d u_{i}$

$B I A S \Leftarrow B I A S + δ * p_{i}$

end if

if ( $d b_{j} < d s_{i} < d b_{j + 1})$ then

$d u_{i} \Leftarrow (d b_{j} + d b_{j + 1}) / 2$

$δ \Leftarrow d s_{i} - d u_{i}$

$B I A S \Leftarrow B I A S + δ * p_{i}$

end if

end for

$d u_{n} \Leftarrow d s_{n} + B I A S / p_{n}$ //For soundness

Proposition 11.

After the transformation of algorithm RPA, the soundness of $D A T A_{u}$ is guaranteed. ( $Soundnes s_{D A T A_{u}}^{R P A} = 1 .)$

Proof.

The biases of $d u_{i} (i = 1, \dots, n - 1)$ comparing to $d s_{i} (1, \dots, n - 1)$ are accumulated into a total value $B I A S$ . $B I A S$ is changed into extra power consumption and added to the last one $d u_{n}$ . Thus, $\sum_{i = 1}^{n} ‍ d s_{i} * p_{i} = \sum_{i = 1}^{n} ‍ d u_{i} * p_{i}$ . The total cost of power consumption in a day maintains the correct value, so $Soundnes s_{D A T A_{u}}^{RPA} = 1$ .

Proposition 12.

The scheme RPS is ultra-lightweight.

Proof.

As algorithm RPA is ultra-lightweight, the number of loops is $(n - 1) * (o - 2)$ . The computation in each loop is only simple operations such as modulo, minus, plus, division, and multiplication. The computation complexity of algorithm RPA is $O (n * o)$ .

Proposition 13.

The scheme RPS can guarantee the perfect full privacy. ( $Privac y_{f u l l}^{p, 𝒜, R P S} = 1$ .)

Proof.

It is clear that $for all d s_{i} \in D A T A_{s}, d u_{i} = F (d s_{i}) \notin D A T A_{b}$ . Thus, $Privac y_{D A T A_{u}}^{RPA} = 1$ . According to the definition of the perfect full privacy, $Privac y_{full}^{p, 𝒜, RPS} = 1$ .

3.3. Random Walk Scheme (RWS)

If the gap between $d b_{j}$ and $d b_{j + 1}$ ( $j = 1, \dots, o - 1$ ) is small, the perturbation (namely, δ) in RPS will be small. It can be proofed as a claim in the following.

Claim 4.

If the gap between $d b_{j}$ and $d b_{j + 1}$ ( $j = 1, \dots, o - 1$ ) is small, the perturbation in RPS will be small.

Proof.

Suppose $\max (| d b_{j} - d b_{j + 1} |) = g (j = 1, \dots, n - 1)$ . If $d s_{i} = d b_{j} \in D A T A_{b}$ ) in RPA, $\max (δ) \leq g / 2$ . If $d s_{i} \neq d b_{j} \in D A T A_{b}$ , $\max (δ) < g / 2$ . Thus, the perturbation δ is small if g is small.

If the perturbation is small, adversaries may guess the $d s_{i}$ correctly, and adversaries can guess the activity is either of two activities. To address this issue, we propose a random walk scheme called RWS in which $d s_{i} \in D A T A_{s}$ randomly jumps to a value in $D A T A_{b}$ . In this case, the privacy definition is extended to include unlinkability, in which the possibility of $d b_{j} \in D A T A_{b}$ for $d s_{i}$ is equal. Thus, the revealed user activity occurs with equal possibility.

Definition 14.

After transformation F, the privacy of $D A T A_{u}$ is guaranteed (denoted as $Privac y_{D A T A_{u}}^{F} = 1$ ), if

\begin{array}{l} \forall d s_{i} \in D A T A_{s}, d u_{i} = F (d s_{i}) \in D A T A_{u}, \\ \forall d b_{p}, d b_{q} \in D A T A_{b}, d b_{p} \neq d b_{q}, \\ Pr {d u_{i} = d b_{p}} = Pr {d u_{i} = d b_{q}} . \end{array}

(10)

The definition for privacy is thus extended to include the definition here and Definition 9.

In RWS, any $d s_{i} \in D A T A_{s} (i = 1, \dots, n - 1)$ is perturbed to a value $d b_{j} \in D A T A_{b}$ ( $j = 1, \dots, o$ ), which is randomly selected. This algorithm is thus, especially, ultra-lightweight in terms of computation. A random walk algorithm (RWA) is proposed for the transformation function F as follows.

3.3.1. Analysis of Algorithm 2

Algorithm 2: Random Walk Algorithm (RWA).

Required: ${D A T A}_{s}$ , ${D A T A}_{b}$

Ensure: ${D A T A}_{u}$ ,Privacy $_{{D A T A}_{u}}^{RWA}$ = 1,Soundness $_{{D A T A}_{u}}^{RWA}$ = 1.

$d s_{i} \Leftarrow G e t D a t a ({D A T A}_{s})$

for $i = 1$ to $n - 1$ do

$j \Leftarrow (r a n d o m () % o + 1)$

$d u_{i} \Leftarrow d b_{j}$

$δ \Leftarrow d s_{i} - d u_{i}$

$B I A S \Leftarrow B I A S + δ * p_{i}$

end for

$d u_{n} \Leftarrow d s_{n} + B I A S / p_{n}$

Proposition 15.

After the transformation of algorithm RWA, the soundness of $D A T A_{u}$ is guaranteed. ( $Soundnes s_{D A T A_{u}}^{R W A} = 1$ .)

Proof.

The proof is similar to the proof of Proposition 11. As $\sum_{i = 1}^{n} ‍ d s_{i} * p_{i} = \sum_{i = 1}^{n} ‍ d u_{i} * p_{i}$ , the total cost of power consumption in a day maintains the correct value. Thus, the soundness of RWA is guaranteed.

Proposition 16.

The scheme RWS is ultra-lightweight.

Proof.

The number of loops is $n - 1$ , so algorithm RPA is ultra-lightweight. The computations in loops are only simple operations such as modulo, minus, plus, and multiplication. Moreover, algorithm RWA is more lightweight than algorithm RPA. Thus, scheme RWS is ultra-lightweight.

Proposition 17.

The scheme RWS can guarantee the perfect full privacy. ( $Privac y_{f u l l}^{p, 𝒜, R W S} = 1$ .)

Proof.

According to the algorithm, for for all $d s_{i} \in D A T A_{s}$ , if $d u_{i} = F (d s_{i}) \in D A T A_{u}$ , we have $\forall d b_{p}, d b_{q} \in D A T A_{b}, d b_{p} \neq d b_{q}$ , and $Pr {d u_{i} = d b_{p}} = Pr {d u_{i} = d b_{q}}$ . Thus, $Privac y_{D A T A_{u}}^{RWA} = 1$ . According to the definition of the privacy in Definition 14, $Privac y_{full}^{p, 𝒜, RWS} = 1$ .

3.4. Distance-Bounded Random Walk with Perturbation Scheme (DBS)

In smart grid, the uploading data will be used as a feedback for future scheduling of distribution and transmission. It thus requires the uploading data can accurately present the power consumption (namely, sensing data). However, thanks to the power distribution and transmission serve not for a single SM, but a large number of SMs (e.g., a campus, a community, or a county scale), only the accuracy for a scale of SMs is sufficient for scheduling.

In RPS and RWS, although the bias exists (that is, uploading data is not equal to sensing data) at single SM, the uploading data for a large number of SMs can still represent power consumption in a scale. More specifically, the deviation between the summation of uploading data and the summation of sensing data is randomly positive or negative in one SM, thus the overall summation remains almost unchanged in expectation in a large scale. It is explained as follows.

Definition 18.

After the transformation F, the accuracy of $D A T A_{u}$ is guaranteed in expectation for a scheduling area (denoted as $Accurac y_{D A T A_{u}}^{F} = 1$ ). The summation of $D A T A_{u}$ equals the summation of $D A T A_{s}$ , in scheduling area and scheduling period. More specifically, suppose that each scheduling period consists of x sensing (uploading) period and each scheduling area consists of y SMs. The uploading data for them is $SU M_{u} = \sum_{t = 1}^{y} ‍ S M_{t}, S M_{t} = \sum_{i = 1}^{x} ‍ d u_{i}$ . The sensing data for them is $SU M_{s} = \sum_{t = 1}^{y} ‍ S M_{t}, S M_{t} = \sum_{i = 1}^{x} ‍ d s_{i}$ . The accuracy of $D A T A_{u}$ is guaranteed, if and only if $SU M_{s} = SU M_{u}$ .

Proposition 19.

After the transformation $R P A$ or $R W A$ , the accuracy of $D A T A_{u}$ is guaranteed in expectation for a scheduling area. ( $Accurac y_{D A T A_{u}}^{R P A ∥ R W A} = 1$ .)

Proof.

In each sensing (uploading) period, $d s_{i}$ is changed into $d u_{i}$ at single SM. $δ = d s_{i} - d u_{i}$ . Suppose that each scheduling period consists of x sensing (uploading) period and each scheduling area consists of y SMs. The uploading data for them is $SU M_{u} = \sum_{t = 1}^{y} ‍ S M_{t}, S M_{t} = \sum_{i = 1}^{x} ‍ d u_{i}$ ; the sensing data for them is $SU M_{s} = \sum_{t = 1}^{y} ‍ S M_{t}, S M_{t} = \sum_{i = 1}^{x} ‍ d s_{i}$ . The expectation of both is equal, as the expectation of δ is 0 in a scheduling area. That is, $\bar{SU M_{s}} = \bar{SU M_{u}}$ , as $\bar{δ} = 0$ , where $\bar{H}$ means the expectation of H.

To further guarantee the scheduling accuracy, we propose a distance-bounded scheme, in which the perturbation value (i.e., δ) is bounded. The accuracy is thus guaranteed within a threshold value. It takes the advantages of former two algorithms RPA and RWA. A distance-bounded algorithm (DBA) for the transformation F is proposed as follows.

3.4.1. Analysis of Algorithm 3

Algorithm 3: Distance-Bounded Algorithm (DBA).

Required: ${D A T A}_{s}, {D A T A}_{b}$ , BOUND

Ensure: ${D A T A}_{u}$ ,Privacy $_{{D A T A}_{u}}^{DBA}$ = 1,Soundness $_{{D A T A}_{u}}^{DBA}$ = 1

$d s_{i} \Leftarrow G e t D a t a ({D A T A}_{s})$

for $i = 1$ to $n - 1$ do

WHILE (1)

{

$j \Leftarrow (r a n d o m () % (o - 2) + 2)$

$δ \Leftarrow i f f (r a n d o m () % 2, (d s_{i} - (d b_{j} + (d b_{j + 1} - d b_{j}) / 2)), (d s_{i} - (d b_{j} + (d b_{j} - d b_{j - 1}) / 2)))$

if ( $a b s (δ) \leq B O U N D$ ) then

$d u_{i} \Leftarrow d s_{i} - δ$

$B I A S \Leftarrow B I A S + δ * p_{i}$

EXIT;

else

CONTINUE;

end if

}

end for

$d u_{n} \Leftarrow d s_{n} + B I A S / p_{n}$

Proposition 20.

After the transformation of algorithm DBA, the soundness of $D A T A_{u}$ is guaranteed. ( $Soundnes s_{D A T A_{u}}^{D B A} = 1$ .)

Proof.

The proof is similar to the proof of Propositions 11 and 15.

Proposition 21.

The scheme DBS is ultra-lightweight.

Proof.

The proof can be reduced to the proof of Propositions 12 and 16.

Proposition 22.

The scheme DBS can guarantee the perfect full privacy. ( $Privac y_{f u l l}^{p, 𝒜, D B S} = 1$ .)

Proof.

The proof is similar to the proof of Propositions 13 and 17.

Proposition 23.

After the transformation $D B A$ , the accuracy of $D A T A_{u}$ is guaranteed in expectation for a scheduling area. $(Accurac y_{D A T A_{u}}^{D B A} = 1 .)$

Proof.

The proof is reduced to the proof of Proposition 19.

Proposition 24.

The summation of uploading data equals the summation of the sensing data with deviation bounded by $α * β * B O U N D$ , where α is the number of SMs in a schedule area, β is the number of sensing (uploading) period in a schedule period. (That is, $| S U M_{u} - S U M_{s} | \leq α * β * B O U N D$ .)

Proof.

The schedule accuracy is the deviation between the summation of uploading data and the summation of sensing data. As it is proofed in Proposition 19, it depends on the number of SMs in the schedule area and the number of sensing (uploading) period in the schedule period. The expectation value is proofed to be 0, as the expectation of δ is 0. Concerning the accuracy of one schedule period, the maximal bias between the summation of uploading data and the summation of sensing data is bounded by $α * β * B O U N D$ .

4. Related Work

The security architectures and overall security requirements in smart grid were discussed in the recent years [3, 7]. Currently, the privacy issue in smart grid starts to attract more attentions. The requirements of privacy were explored in some previous works [8–11]. They pointed out the importance and urgency of privacy issues. Efthymiou and Kalogridis proposed a privacy protection scheme via anonymization of data [12]. Their work relied on Escrow and Public Key Infrastructure (PKI); thus the flexibility and scalability may be tampered. Tomosada and Sinohara proposed to use virtual energy demand to estimate the energy load and protecting consumer privacy [13], but the estimation may take much computation overhead, and accuracy may be damaged. Lu et al. [10] proposed an efficient and privacy-preserving aggregation scheme (EPPA). Their scheme relied on homomorphic Paillier cryptosystem and induces much computation overhead. Cheung et al. [14] proposed a credential-based privacy-preserving power request scheme for smart grid, which relied on an advanced cryptographic primitive-blind signature. He et al. [15] proposed to use homomorphic encryption for smart grid communications. Comparing with all aforementioned related work, our final scheme does not rely on any cryptographic primitive but fulfils provable privacy and restrains ultra-lightweight in computation.

5. Conclusions

In this paper, we proposed three schemes to protect user privacy in smart grid without any cryptographic primitive and with ultra-lightweight computation. They are random perturbation scheme (RPS), random walk scheme (RWS), and distance-bounded random walk with perturbation scheme (DBS). We also proposed three algorithms for three schemes, respectively. Our schemes do not rely on any cryptographic computations, are sound in terms of maintaining the correct utility charge, can guarantee the privacy that were strictly proofed, and can ensure the scheduling accuracy in power transmission and distribution. All proposed schemes and algorithms were extensively analyzed, which justified their applicability.

Footnotes

Acknowledgments

W. Ren's research was financially supported by National Natural Science Foundation of China (61170217), the Open Research Fund from Shandong provincial Key Laboratory of Computer Network (SDKLCN-2011-01), and Fundamental Research Funds for the Central Universities (CUG110109). Y. Ren's research was sponsored in part by the “Aim for the Top University Project” of the National Chiao Tung University and the Ministry of Education, Taiwan.

References

The Smart Grid Interoperability Panel Cyber Security Working Group

Nistir 7628 guidelines for smart grid cyber security

Privacy and the smart grid

2010

http://csrc.nist.gov/publications/nistir/ir7628/nistir-7628_vol2.pdf

Khurana

Hadley

Frincke

D. A.

Smart-grid security issues

IEEE Security and Privacy 2010 8 1 81 85

2-s2.0-77249138261

10.1109/MSP.2010.49

McDaniel

McLaughlin

Security and privacy challenges in the smart grid

IEEE Security and Privacy 2009 7 3 75 77

2-s2.0-67650822699

10.1109/MSP.2009.76

Metke

A. R.

Ekl

R. L.

Security technology for smart grid networks

IEEE Transactions on Smart Grid 2010 1 1 99 107

2-s2.0-77952887452

10.1109/TSG.2010.2046347

Ericsson

G. N.

Cyber security and power system communicationessential parts of a smart grid infrastructure

IEEE Transactions on Power Delivery 2010 25 3 1501 1507

2-s2.0-77954005795

10.1109/TPWRD.2010.2046654

Vaccaro

Popov

Villacci

Terzija

An integrated framework for smart microgrids modeling, monitoring, control, communication, and verification

Proceedings of the IEEE 2010 99 1 119 132

2-s2.0-78149341763

10.1109/JPROC.2010.2081651

Overman

T. M.

Sackman

R. W.

Davis

T. L.

Cohen

B. S.

High-assurance smart grid: a three-part model for smart grid control systems

Proceedings of the IEEE 2011 99 6 1046 1062

2-s2.0-79956359759

10.1109/JPROC.2011.2112310

Liu

Xiao

Liang

Chen

Cyber security and privacy issues in smart grids

IEEE Communications Surveys Tutorials 2012 99 1 17

Maandrmol

Sorge

Ugus

Peandrez

Do not snoop my habits: preserving privacy in the smart grid

IEEE Communications Magazine 2012 50 5 166 172

10.1109/MCOM.2012.6194398

10.

Liang

Lin

Shen

Eppa: an efficient and privacypreserving aggregation scheme for secure smart grid communications

IEEE Transactions on Parallel and Distributed Systems 2012 23 9 1621 1631

10.1109/TPDS.2012.86

11.

Wang

Cui

Que

Choi

Jiang

Cheng

Xie

A randomized response model for privacy preserving smart metering

IEEE Transactions on Smart Grid 2012 3 3 1317 1324

10.1109/TSG.2012.2192487

12.

Efthymiou

Kalogridis

Smart grid privacy via anonymization of smart metering data

Proceedings of the 1st IEEE International Conference on Smart Grid Communications (SmartGridComm ′10)

October 2010

238 243

13.

Tomosada

Sinohara

Virtual energy demand data: estimating energy load and protecting consumers' privacy

Proceedings of the IEEE PES Innovative Smart Grid Technologies (ISGT ′11)

January 2011

1 8

2-s2.0-79958007491

10.1109/ISGT.2011.5759159

14.

Cheung

Chim

Yiu

Hui

Credential-based privacy-preserving power request scheme for smart grid network

Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM ′11)

December 2011

1 5

15.

Pun

Kuo

Secure and efficient cryptosystem for smart grid using homomorphic encryption

Proceedings of the IEEE PES Innovative Smart Grid Technologies (ISGT ′12)

January 2012

1 8