Post

1. One-Time Pad, Stream Ciphers and PRGs

1. One-Time Pad, Stream Ciphers and PRGs

Assumptions and Notations

An encryption scheme is defined by 3 algorithms.

  • A (probabilistic) key generation algorithm G.
  • E:K×MC and D:K×CM which are encryption, decryption algorithms respectively.

We assume perfect correctness.

Assumption. kK, mM, if c=E(k,m) then D(k,c)=m with probability 1.

This assumption allows us to assume that the decryption algorithm D:K×CM is deterministic, since it will always give the same output for the same key and ciphertext.

Some random variables:

  • K for the distribution over the key space K.
    • The output from G. We denote this as kK.
  • M for the message being encrypted, and C for the ciphertext.
    • We will write cE(k,m).
  • K and M are required to be independent.

Perfect Secrecy

For a scheme to be perfectly secret, the ciphertext should not reveal any information about the underlying plaintext, so that the adversary learns absolutely nothing about the plaintext.

Definition. An encryption scheme is perfectly secret if for any distribution of M and any ciphertext cC such that Pr[C=c]>0,

Pr[M=mC=c]=Pr[M=m]

for every mM.

For example, the shift cipher with M as the set of all two-letter characters is not perfectly secure. Suppose that c=XX is observed. Then the adversary learns that the plaintext must consist of the same letter, revealing some information.

The above definition is equivalent to the following.

Definition. An encryption scheme is perfectly secret if for any m1,m2M and cC, we have

Pr[E(k,m1)=c]=Pr[E(k,m2)=c]

where the probability is taken over the random choice kK.

Proposition. The above two definitions are equivalent.

Proof. Suppose we are given any distribution on M, any mM with Pr[M=m]>0 and any cC. Since K and M are independent, we have

Pr[C=cM=m]=Pr[E(k,M)=cM=m]=Pr[E(k,m)=cM=m]=Pr[E(k,m)=c],

where E(k,m) denotes the distribution of the ciphertext of m over the key kK. Also,

(1)Pr[M=mC=c]Pr[C=c]=Pr[C=cM=m]Pr[M=m]

by the definition of conditional probability.

() If Pr[M=mC=c]=Pr[M=m], then Pr[C=c]=Pr[C=cM=m] by equation (1). Therefore

Pr[E(k,m)=c]=Pr[C=cM=m]=Pr[C=c]=Pr[C=cM=m]=Pr[E(k,m)=c].

() If Pr[M=m]=0, then we are done. So assume Pr[M=m]>0. Then

Pr[C=c]=mMPr[C=cM=m]Pr[M=m]=mMPr[E(k,m)=c]Pr[M=m]=mMPr[E(k,m)=c]Pr[M=m]=Pr[E(k,m)=c]=Pr[C=cM=m].

Thus equation (1) implies Pr[M=mC=c]=Pr[M=m].

One-Time Pad (OTP)

This is an encryption scheme patented by Vernam in 1917.

  • Set M=K=C={0,1}n.
  • G chooses a key from K according to the uniform distribution.
    • Each key is chosen with probability 2n.
  • E(k,m)=km and D(k,c)=kc.

From the above definition, the correctness of the scheme is easily checked.

Intuitively, km has the uniform distribution over {0,1}n.

Perfect Secrecy of OTP

Theorem. The one-time pad encryption scheme is perfectly secret.

Proof. For any cC and mM with Pr[M=m]>0, we have

Pr[C=cM=m]=Pr[Km=cM=m]=Pr[K=mcM=m]=2n

since K and M are independent. Given any distribution on M, we can see that

Pr[C=c]=mMPr[C=cM=m]Pr[M=m]=2nmMPr[M=m]=2n.

Therefore Pr[M=mC=c]=Pr[M=m] by equation (1).

Here is another proof using the second definition.

Proof. For any mM and cC,

Pr[E(k,m)=c]=Pr[km=c]=Pr[k=cm]=2n.

Thus OTP satisfies the definition of perfect secrecy.

Drawbacks of OTP

The OTP is perfectly secure, but there are some drawbacks to using the OTP in practice.

First of all, OTP is perfectly secure only for one message. Suppose that we reuse the key k for two messages m1, m2. Then since c1=m1k, c2=m2k, we have the following relation

c1c2=(m1k)(m2k)=m1m2.

Since the adversary can see the ciphertext, this kind of relation leaks some information about m1 or m2. For example, the adversary can learn exactly where the two messages differ. So if the key is reused, the scheme cannot be perfectly secret.

Also, the key is (at least) as long as the message. This is why OTP is rarely used today. When sending a long message, two parties must communicate a very long key that is as long as the message, every single time! This makes it hard to manage the key.

Shannon’s Theorem

So is there a way to reduce the key size without losing perfect secrecy? Sadly, no. In fact, the key space must be as least as large as the message space. This is a requirement for perfectly secret schemes.

Theorem. If (G,E,D) is a perfectly secret encryption scheme, then |K||M|.

Proof. Assume |K|<|M|, and give M the uniform distribution over M. Let cC be a ciphertext with Pr[C=c]>0. Define

M(c)={mm=D(k,c) for some kK},

which is the set of all possible decryptions of c. Since we can assume D to be deterministic, |M(c)||K|.1 Then we can choose mMM(c) since |M(c)|<|M|. Then

Pr[M=mC=c]=0Pr[M=m],

so the scheme cannot be perfectly secret.

In words, if |K|<|M|, there is some message mM that is not a decryption of any ciphertext c. Then Pr[M=mC=c] is 0, but Pr[M=m] is not. Thus the scheme cannot be perfectly secret.

Pseudorandom Generators (PRG)

The problem with one-time pad is that we must use very long keys. So the idea of stream ciphers is to replace the random key by a pseudorandom key. The pseudorandom generator (PRG) will compute a pseudorandom key from a seed chosen from a smaller space.

Definition. A pseudorandom generator is an efficient deterministic algorithm G such that given an input seed from {0,1}s, it outputs an element of {0,1}n. Typically, sn.

Stream Ciphers

Then the stream cipher is defined as follows. Note that we have reduced the key size.

  • K={0,1}s, M=C={0,1}n with sn.
  • G:{0,1}s{0,1}n.
  • E(k,m)=G(k)m and D(k,c)=G(k)c.

Since |K|<|M|, stream cipher cannot be perfectly secret.

Security of PRGs

Negligible Functions

A negligible function is a function that tends to 0 as n, but faster than the inverse of any polynomial.

Definition. A function f:NR is negligible if for all c>0, there exists NN such that for all nN, we have |f(n)|<nc.

The following is evident from the definition.

Lemma. A function f:NR is negligible if and only if for all c>0,

limnf(n)nc=0.

In practice, about 230 is non-negligible since it is likely to happen over 1 GB of data. Meanwhile, 280, 2128 are negligible since it is very unlikely to happen over the life of a key.

Unpredictability of PRGs

The adversary will want to distinguish if some bit string is an output of a PRG or truly random. So PRGs must satisfy the notion of unpredictability, which says that no efficient algorithm can predict the next bit of PRGs.

Definition. A PRG is predictable if there exists an efficient adversary A and i<n such that

Pr[A(G(k)[0..i1])=G(k)[i]]>12+ϵ

for non-negligible ϵ>0.

The probability here is taken over k{0,1}s, and G(k)[0..i1] denotes the first i bits of G(k). Also, the probability has to be non-negligible compared to 12, implying that the adversary should be better than random guessing.

A PRG is unpredictable if it is not predictable, meaning that no efficient adversary A can predict the next bit at any position.

Indistinguishability

Statistical Test

Since the stream cipher is not perfectly secret, we need a weaker notion of security. That is, indistinguishability to the adversary.

Let G:{0,1}s{0,1}n be a PRG. Then our goal is that G(k) from k{0,1}s and r{0,1}n are indistinguishable, while they differ in their procedures.

To formally define what it means to be indistinguishable, we consider a statistical test.

Definition. A statistical test on {0,1}n is an algorithm A such that A(x) outputs 0 for not random or 1 for random.

For example, an algorithm A defined as

A(x)=1 if and only if the difference in the number of occurrences of 0 and 1 is less than 10n.

would be a statistical test.

Advantage

Let G:{0,1}s{0,1}n be a PRG and A be a statistical test on {0,1}n.

Definition. The PRG advantage is defined as

AdvPRG[A,G]=|Prk{0,1}s[A(G(k))=1]Prr{0,1}n[A(r)=1]|.

Intuitively, the advantage calculates how well A distinguishes G(k) from truly random bit strings. Recall that A will output 1 if it thinks that the given bit string is random. The first probability term is the case when A is given a pseudorandom string, but A decides that the string is random. (incorrect) The second probability term is the case when A is given a random string and it decides that it is indeed random. (correct) Therefore,

  • If the advantage is close to 0, the probabilities are almost the same, meaning that A cannot distinguish G(k) from a random bit string.
  • If the advantage is close to 1, one of the probabilities is close to 0, meaning that A can distinguish G(k) from a random bit string.2

Secure PRG

Now we can define the security of PRGs.

Definition. G:{0,1}s{0,1}n is a secure PRG if for any efficient statistical test A, AdvPRG[A,G] is negligible.

There are no provably secure PRGs, but we have heuristic candidates, meaning that no such efficient A has been found.

Predictability and Security of PRGs

We can deduce that if a PRG is predictable, then it is insecure.

Theorem. Let G be a PRG. If G is predictable, then it is insecure.

Proof. Let A be an efficient adversary (next bit predictor) that predicts G. Suppose that i is the index chosen by A. With A, we construct a statistical test B such that AdvPRG[B,G] is non-negligible.

mc-01-prg-game.png

  1. The challenger PRG will send a bit string x to B.
    • In experiment 0, PRG gives pseudorandom string G(k).
    • In experiment 1, PRG gives truly random string r.
  2. B gives x[0..i1] to A, then A will do some calculation and return y.
  3. B compares x[i] with y, and returns 1 if x[i]=y, 0 otherwise.

Let Wb be the event that B outputs 1 in experiment b. For b=0, B outputs 1 if A correctly guesses x[i], which happens with probability 12+ϵ for non-negligible ϵ. As for b=1, the received string is truly random. Then the values of x[i] and y are independent so Pr[W1]=12. Therefore,

AdvPRG[B,G]=|Pr[W0]Pr[W1]|=|12+ϵ12|=ϵ,

and the advantage is non-negligible.

Surprisingly, the other way around is also true.

Theorem. (Yao’82) If a PRG G is unpredictable, then G is secure.

The theorem implies that if next bit predictors cannot distinguish G from true random, then no statistical test can.

Security Game Framework

To motivate the definition of semantic security, we consider a security game framework (attack game) between a challenger (ex. the creator of some cryptographic scheme) and an adversary A (ex. attacker of the scheme).

mc-01-ss.png

Definition. Let E=(G,E,D) be a cipher defined over (K,M,C). For a given adversary A, we define two experiments 0 and 1. For b{0,1}, define experiment b as follows:

Experiment b.

  1. The adversary computes m0,m1M and sends them to the challenger.
  2. The challenger computes kK, cE(k,mb) and sends c to the adversary.
  3. The adversary outputs a bit b{0,1}.

Let Wb be the event that A outputs 1 in experiment b. i.e, the event that A(EXP(b))=1. Now define the semantic security advantage of A with respect to E as

AdvSS[A,E]=|Pr[W0]Pr[W1]|.

As we understood the advantage of PRG, semantic security advantage can be understood the same way. If the advantage is closer to 1, the better the adversary distinguishes the two experiments.

Distinguishing Distributions

In the same way, we can define a security game for distinguishing two distributions.

Definition. Let P0, P1 be two distributions over a set S. For any given efficient adversary A, define experiments 0 and 1.

Experiment b.

  1. The challenger computes xPb and sends x to the adversary.
  2. The adversary computes and outputs a bit b{0,1}.

Let Wb the event that A outputs 1 in experiment b. Then the advantage is defined as

Adv[A]=|Pr[W0]Pr[W1]|

If the advantage is negligible, we say that P0 and P1 are computationally indistinguishable, and write P0cP1.

As an example, a PRG G is secure if the two distributions G(k) over k{0,1}s and r{0,1}n are computationally indistinguishable.

Semantic Security

So now we can define a semantically secure encryption scheme.

Definition. An encryption scheme E is semantically secure if for any efficient adversary A, its advantage AdvSS[A,E] is negligible.

It means that the adversary cannot distinguish whether the received message is an encryption of m0 or m1, even though it knows what the original messages were in the first place!

Using the notion of computational indistinguishability, E is semantically secure if for any m0,m1M, the distribution of ciphertexts of m0 and m1 with respect to kK is computationally indistinguishable.

E(K,m0)cE(K,m1)

Semantic Security of the Stream Cipher

Theorem. If G:{0,1}s{0,1}n is a secure PRG, then the stream cipher E constructed from G is semantically secure.

Proof. Let A be an efficient adversary that breaks the semantic security of E. We will use A to construct a statistical test B that breaks the security of the PRG.

Since A can break the semantic security of the stream cipher, for m0,m1M chosen by A, it can distinguish m0G(k) and m1G(k). Let Wb be the event that A returns 1 for mbG(k). The advantage AdvSS[A,E]=|Pr[W0]Pr[W1]| is non-negligible.

Define two new experiments as follows:

  1. The adversary A gives two messages m0,m1M.
  2. The challenger draws a random string r{0,1}n.
  3. In experiment b, return mbr.
  4. A will return b{0,1}.

Let Wb be the event that A returns 1 for mbr. Then, by triangle inequality,

(2)AdvSS[A,E]=|Pr[W0]Pr[W1]||Pr[W0]Pr[W0]|+|Pr[W0]Pr[W1]|+|Pr[W1]Pr[W1]|=|Pr[W0]Pr[W0]|+|Pr[W1]Pr[W1]|.

The last equality holds since OTP is perfectly secure and thus |Pr[W0]Pr[W1]|=0. Since AdvSS[A,E] is non-negligible, at least one of the terms on the right hand side should also be non-negligible.

Without loss of generality, assume that |Pr[W0]Pr[W0]| is non-negligible. This implies that A can distinguish m0G(k) from m0r. Using this fact, we can construct a statistical test B for the PRG as follows.

  1. Challenger PRG gives a bit string x.
    • In experiment 0, challenger gives pseudorandom string G(k).
    • In experiment 1, challenger gives truly random string r.
  2. Invoke A, then A will send two messages m0,m1M.
  3. Compute c=m0x and return c to A.
  4. A will return b, and return b directly to challenger PRG.

Let Yb the event that B returns 1 on experiment b. Then, we directly see that

Pr[Y0]=Pr[W0],Pr[Y1]=Pr[W0].

Therefore, the PRG advantage of B is

AdvPRG[B,G]=|Pr[Y0]Pr[Y1]|=|Pr[W0]Pr[W0]|,

which is non-negligible, so it breaks the security of the PRG.

Corollary. For any adversary A for the stream cipher E, there exists an adversary B for a PRG G such that

AdvSS[A,E]2AdvPRG[B,G].

Proof. Use equation (2) in the above proof.

This theorem tells use that the (semantic) security of the stream cipher relies on the security of the PRGs.3 Thus we conclude that it is important to construct a secure PRG.

  1. Some pair of keys may give the same decryption of c↩︎

  2. If Pr[A(r)=1]0, then A can distinguish pseudorandom from true random, although it is almost always incorrect. ↩︎

  3. Note that the meaning of security is different for stream ciphers and PRGs. ↩︎

This post is licensed under CC BY 4.0 by the author.