applied-cryptography/exam/ex.tex

\documentclass[a4paper]{article}
\usepackage[english]{babel}
\usepackage{amsmath,amssymb,amsthm}
\usepackage{color}
\usepackage{units}

\newcommand{\TODO}{\textcolor{red}{TO DO}}

\begin{document}

\begin{center}
  \textbf{\Large NWI-IMC061 -- Applied Cryptography}\\[4pt]

  \textbf{\large Final Exam, Academic Year 2021--2022}
\end{center}

\bigskip
\hrule
\bigskip

\noindent \textbf{Last Name:} Eidelpes

\medskip\noindent \textbf{First Name:} Tobias

\medskip\noindent \textbf{Student Number:} s1090746

\medskip\noindent \textbf{Personalized Appendix Sequence Number:} 30

\bigskip
\hrule
\bigskip

\begin{enumerate}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%% SYMMETRIC - LITERATURE %%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  \item \textbf{(18 points)}
  \begin{enumerate}

    \item EWCDM stands for \emph{Encrypted Wegman-Carter with Davies-Meyer}. As
      the name implies, EWCDM is based on a Wegman-Carter construction which
      takes the hash of a message $M$ and XORes it with the application of a
      pseudorandom function (PRF) to a nonce $N$. This construction is very
      efficient and also has a strong security bound. However, it is very
      vulnerable to \emph{nonce-misuse}. To deal with that problem, the
      Wegman-Carter construction is wrapped by another call to the PRF with a
      different key. Another disadvantage is the fact that PRFs are hard to get
      by and instead pseudorandom permutations are used. If a pseudorandom
      permutation (i.e. block cipher) is used, the security bound of the
      construction drops to the birthday bound ($2^{n/2}$). The authors replace
      the inner call to the PRF with the \emph{Davies-Meyer} construction
      \[ \mathrm{DM}[E]_K(N) = E_K(N)\oplus N \]
      and then encrypt that (with the hashed message) in another call to the
      block cipher. The resulting EWCDM construction looks like this
      \[ E_{K'}(E_K(N)\oplus N\oplus H_{K_h}(M)) \]
      and is secure \emph{beyond} the birthday bound against nonce-respecting
      adversaries while still offering birthday bound security against
      nonce-misusing adversaries.

    \item The type of symmetric cryptographic scheme introduced is a Message
      Authentication Code (MAC).

    \item The size of the key(s) depends on the block cipher and the keyed hash
      function. In total there likely need to be two distinct keys for the block
      cipher calls and one key for the hash function.

    \item Since EWCDM is based on a block cipher and a hash function and because
      those usually operate on fixed-length inputs, the construction also
      operates on fixed-length inputs. Messages come in variable-length sizes
      and need to be padded by the block cipher to the specified block size.

    \item Depending on the amount of input blocks, the construction will
      generate multiples of the block size as outputs. The outputs are
      variable-length.

    \item EWCDM is based on a pseudorandom permutation (i.e. block cipher) and
      an almost xor-universal (AXU) hash function (one-way function).

    \item Yes, the authors delivered a security proof. The proof assumes that
      the encryption function $E$ is a secure pseudorandom permutation for the
      case of a nonce-misusing adversary. This requirement on the security of
      $E$ is not present if the adversary is nonce-respecting. Additionally, the
      distinguisher is computationally unbounded and never repeats a query.

    \item The practical relevance is high, in my opinion. This is due to the
      fact that the EWCDM construction is secure against nonce-misusing
      adversaries up to the birthday bound. It has been shown that implementing
      nonces securely is a difficult task. If a scheme is easily broken by wrong
      handling of nonces, there is no \emph{fallback} security guarantee. The
      EWCDM construction, however, provides such a \emph{fallback} security
      guarantee and is of high practical relevance.

    \item Poly1305 is also a message authentication code (MAC), which we
      discussed in the lecture.

    \item One advantage of EWCDM over Poly1305 is that the former is
      nonce-misuse resistant up to the birthday bound while Poly1305 is not.

    \item One disadvantage of EWCDM is that it requires two calls to the
      underlying block cipher. This can have potentially serious performance
      implications for small, low-resource embedded devices.

  \end{enumerate}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%% SYMMETRIC - KEYED %%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  \item \textbf{(16 points)}
  \begin{enumerate}

    \item $\mathsf{CrAp}_K^{-1}$ operates by taking the ciphertexts
      $C_1,\cdots,C_l$ and passing them to the decryption function
      $\widetilde{E}^{-1}(K,N,\cdot)$. The decryption function takes 128-bit
      inputs and produces a 128-bit output. The output has to be stripped of the
      counter (the last 26 bits) to obtain the 102-bit message block
      $M_1,\cdots,M_l$. Finally, the padding (if any) has to be removed from
      $M_1,\cdots,M_l$ to obtain the original message block (102 bits).

    \item The length of the message $M$ is limited by the counter, which is at
      most 26 bits long. Since the very first counter ($\langle 0\rangle_{26}$)
      is reserved for the tag, $2^{26}-2$ message blocks remain. Every block
      (without the counter) is at most 102 bits long which gives a maximum
      message length of $102\cdot (2^{26}-2) = \unit[6845103924]{bits}$.

    \item $\widetilde{E}$ should behave like a pseudorandom permutation in order
      to be able to prove the security of $\mathsf{CrAp}$. If it does not, a
      distinguisher is able to gain a significant advantage because the block
      cipher does not actually generate \emph{random} outputs. Further, if the
      security of the underlying primitive is broken, the whole scheme falls
      apart.

    \item \TODO

    \item \TODO

    \item The length of the random nonce $N$ is $\unit[96]{bits}$. The expected
      number of evaluations an attacker has to make to obtain a repeated nonce
      is $2^{96/2} = 2^{48}$.

    \item After $2^b = 2^{62}$ forgery attempts, the attacker has exhausted the
      keyspace of the tag because the tag $T$ is of size $\unit[62]{bits}$. The
      distinguisher checks continuously if the current tag matches the
      ciphertext. If it does not, the tag is incremented by one until $2^{62}$
      queries have been made. Eventually, the distinguisher will get the valid
      tag and is then able to identify if it is in the real world or in the
      ideal world.

    \item \TODO

  \end{enumerate}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%% SYMMETRIC - UNKEYED %%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  \item \textbf{(16 points)}
  \begin{enumerate}

    \item The chaining value size is $\unit[101]{bits}$ ($=g$) and the message
      block size is $655-101=\unit[554]{bits}$.

    \item If the message is of size $|M|=\unit[1234567]{bits}$ and the block
      size is $\unit[554]{bits}$, we need $\frac{1234866}{554}=2229$ blocks with
      a padding of $1234866-1234567=\unit[299]{bits}$. Additionally, the
      message length is also encoded in a $\unit[554]{bit}$ block and so the
      total number of blocks is $2230$. The total number of blocks corresponds
      to the total number of evaluations needed, which is $2230$ evaluations of
      $P$.

    \item In order for $(x,y)$ to be a valid preimage for the compression
      function $F^P$, $x$ must be of size $\unit[800]{bits}$ and contain 55
      zeros at the beginning and 90 zeros at the end. The 655 bits in-between
      can be modified by an adversary to achieve the required target. Similarly,
      $y$ must be of size $\unit[800]{bits}$ where the first 50 and the last 649
      bits are discarded. The bits in-between must be 101 zeros to satisfy our
      target image. Furthermore, the following condition must be true to achieve
      a valid preimage: $[F^P(x)=y]$ where $x$ and $y$ satisfy the
      aforementioned conditions.

    \item If the adversary makes one forward query, the probability that it hits
      the target image is $1/2^{g} = 1/2^{101}$. The adversary wants to find a
      $\unit[655]{bit}$ input to map to 101 zeros. Therefore, the whole search
      space is $2^{101}$ and the probability with one query is $1/2^{101}$.

    \item If the adversary makes one inverse query, the probability that it hits
      a preimage is $1/2^{55+90} = 1/2^{145}$. This is due to the fact that the
      first $\unit[55]{bits}$ and the last $\unit[90]{bits}$ have to be zeros.

    \item An adversary does not gain additional information by using inverse
      queries additionally to forward queries. Forward queries have a better
      probability of being successful at breaking preimage resistance and thus
      an adversary should focus on that. The total probability is thus
      $1/2^{101}$.

  \end{enumerate}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%% ASYMMETRIC - LITERATURE %%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  \item \textbf{(17 points)}
  \begin{enumerate}

    \item LEDAcrypt is a post-quantum asymmetric suite of cryptosystems. It
      contains a public-key encryption scheme and a key-encapsulation mechanism
      (KEM). The underlying hard problem (arbitrary linear binary code decoding)
      is currently believed to be secure against quantum adversaries.

    \item The authors introduce a post-quantum public-key cryptosystem based on
      linear codes.

    \item IND-CCA2 is proven for both the KEM and the PKC. IND-CPA is proven for
      the KEM.

    \item LEDAcrypt is based on the hardness of the decoding problem for linear
      codes. Given a parity-check matrix $H$ and a received codeword $y$, the
      syndrome is $s=yH$. The best estimate for the received codeword is
      $x=y+z_0$. Find a minimum-weight solution $z_0$ for the equation $s=zH$.
      Finding a minimum-weight solution to $s=zH$ given $s$ and $H$ is
      $\mathsf{NP}$-hard.

    \item The private key in LEDAcrypt consists of two binary matrices $Q$ and
      $H$. The public key is constructed from the matrix $L=Q\cdot H$. The
      security of the scheme relies on the fact that obtaining the original
      information from a perturbed codeword is hard unless the factorization of
      the public key ($Q\cdot H$) is known. If the aforementioned problem of
      decoding linear codes has a polynomial-time solution, an attacker will
      also easily be able to obtain the factorization of the public key. If that
      was possible, the scheme would be broken.

    \item The strongest type of security the authors claim to achieve is
      IND-CCA2. The authors use the Fujisaki-Okamoto transform to achieve
      IND-CCA2 security.

    \item The scheme can be used to exchange symmetric keys between parties
      with the usage of the key encapsulation mechanism (KEM). In that scenario,
      the sender encrypts a symmetric key with LEDAcrypt and shares the
      encrypted key with the other party. The other party then decrypts the
      message to obtain the symmetric key which can be used for further
      communication.

    \item The lowest security level treated by the authors is level 1 of the
      NIST security levels corresponding to AES-128. The parameters depend on
      whether the scheme is used for ephemeral or long-term keys and what kind
      of code rate ($n_0$) is needed. For ephemeral keys with $n_0=2$ the
      authors suggest values of: $p=14,939$, $t=136$, $d_v=11$ and $m=[4,3]$.
      For long-term keys the authors suggest values of: $p=35,899$, $t=136$,
      $d_v=9$, $m=[5,4]$, $\overline{t}=4$ and $b_0=44$. These parameters are
      chosen with respect to an adversary using Information Set Decoding (ISD)
      to find a solution to the underlying hard problem.

    \item The size for ephemeral keys is $\unit[452]{bytes}$ (in memory) for the
      private key and $\unit[1872]{bytes}$ for the public key. The size for
      long-term keys is $\unit[468]{bytes}$ (in memory) for the private key and
      $\unit[4488]{bytes}$ for the public key.

    \item Kyber512 is also a KEM and achieves the same level of (classical)
      security.

    \item One advantage of LEDAcrypt is that the key sizes are relatively small
      compared to Classic McEliece, for example. Small key sizes are important
      for transmission of public keys so that they can fit in commonly used
      packet sizes.

    \item One disadvantage of the scheme is that it inherently has a non-zero
      decoding failure rate (DFR). For ephemeral keys and the lowest security
      level, the authors advertise an error probability of $14$ out of $1.2\cdot
      10^9$ decodes. The DFR can be lowered by choosing different parameters,
      but the rate is arguably still too high for practical use.

      For long term keys the authors state that 95 out of 100 keys (lowest
      security level) provide a DFR of $2^{-64}$, which is also arguably too low
      for extended use well into the future.

  \end{enumerate}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%% ASYMMETRIC - SECURITY %%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  \item \textbf{(33 points)}
  \begin{enumerate}

    \item Let there be an adversary $\mathcal{A}$ which breaks CGI. We can then
      construct an adversary $\mathcal{B}$ which breaks CGI2.

      Suppose $\mathcal{B}$ is given a CGI2 instance
      $(\mathcal{G}_a,\mathcal{G}_b)$ where $a\neq b$ and $\mathcal{G}_a$ and
      $\mathcal{G}_b$ are in the set of $2^{130}$ graphs isomorphic to
      $\mathcal{G}$. The goal of $\mathcal{B}$ is to find an isomorphism $\phi$
      with non-negligible advantage such that $\mathcal{G}_a =
      \phi(\mathcal{G}_b)$. $\mathcal{B}$ will give
      $(\mathcal{G}_a,\mathcal{G}_b)$ to $\mathcal{A}$ and $\mathcal{A}$ will
      output an isomorphism $\phi$ which satisfies $\mathcal{G}_a =
      \phi(\mathcal{G}_b)$. $\mathcal{B}$ can then take this isomorphism and
      apply it to its own problem to obtain the solution.

    \item First, the prover takes a random isomorphism and generates a
      permutation of the given graph $\mathcal{G}$. The resulting graph is the
      commitment which is sent to the verifier. The verifier then picks a random
      graph from the set of graphs isomorphic to $\mathcal{G}$ and sends it to
      the prover. The prover takes this graph and calculates the permutation
      needed to arrive at the original graph $\mathcal{G}$. This is the response
      which is sent to the verifier. The verifier can then use the response to
      check if the graph it picked earlier (in the challenge) is actually
      isomorphic to $\mathcal{G}$. If it is, the verifier accepts, otherwise it
      rejects.

    \item The domain of the commitment scheme is the set of graphs isomorphic to
      $\mathcal{G}$ and the range is the number ($2^{130}$) of isomorphic
      graphs. The scheme consists of three phases: setup, commitment and
      opening. The setup phase consists of choosing an appropriate random
      permutation $\psi$ from the set of isomorphisms on $\mathcal{G}$. The
      commitment phase takes the isomorphism $\psi$ and the graph $\mathcal{G}$
      as input and produces a commitment $\mathcal{G}'$. The opening phase takes
      an isomorphism $\mathsf{resp}$ and another graph
      $\mathcal{G}_{\mathsf{ch}}$ isomorphic to $\mathcal{G}$ as well as the
      original commitment as input and outputs $\top$ if the result matches
      $\mathcal{G}'$ and $\bot$ otherwise.

    \item Computational binding: Suppose $\mathsf{Comm}(\psi,\mathcal{G}_0) =
      \mathsf{Comm}(\psi,\mathcal{G}_1)$. This means that $\psi(\mathcal{G}_0) =
      \psi(\mathcal{G}_1)$ and the adversary has found an isomorphism which maps
      two different graphs to the same output which corresponds to solving the
      CGI problem.

    \item If $G_{ch}=\phi_{ch}(G)$ and $G'=\psi(G)$, it follows that
      $G=\phi_{ch}^{-1}(G_{ch})$ and therefore $G'=\psi(\phi_{ch}^{-1}(G_{ch}))$
      so the verifier will always accept.

    \item Suppose $G_{ch}$ is not isomorphic to $G$. $\mathcal{P}$ prepares in
      advance for a challenge $ch^*$ and so
      $G'=\psi(\phi_{ch^*}^{-1}(G_{ch^*}))$. $\mathcal{P}$ commits to $G'$. If
      the challenge by $V$ is $ch^*$ (so $ch=ch^*$), $\mathcal{V}$ accepts,
      otherwise it rejects. Because $ch\in\{0,\dots,2^{130}-1\}$, the
      probability that $\mathcal{P}$ convinces $\mathcal{V}$ is
      $1/2^{130}$ (soundness error).

    \item The soundness error after one iteration is $1/2^{130}$. To achieve
      a $1/2^{192}$ soundness error, the protocol should be done twice to arrive
      at a soundness error of $1/2^{260}$, which is well below the required
      $1/2^{192}$.

    \item A simulator $\mathcal{S}$ is built as follows:
      \begin{itemize}
        \item $\mathcal{S}$ starts $\mathcal{V}^*$ with $G_i$ and
          $i\in\{0,\dots,2^{130}-1\}$.
        \item $\mathcal{S}$ makes a guess $\mathsf{ch}^*$ and calculates
          $G'\leftarrow\psi(\phi_{ch^*}^{-1}(G_{ch^*}))$.
        \item $\mathcal{S}$ gets a challenge $\mathsf{ch}$ from $\mathcal{V}^*$.
          If $\mathsf{ch}=\mathsf{ch}^*$, $\mathcal{S}$ outputs
          $(G',\mathsf{ch}^*,\phi_{ch^*}^{-1}\psi)$. If
          $\mathsf{ch}\neq\mathsf{ch}^*$, $\mathcal{S}$ rewinds $\mathcal{V}^*$
          and goes to step 2.
      \end{itemize}
      The simulator $\mathcal{S}$ is expected probabilistic polynomial-time with
      $2^{130}n$ time and the protocol is zero-knowledge.

    \item For completeness see 5e.

    Special soundness: given two accepting transcripts for the same commitment
    $\mathsf{trans} = (G',\mathsf{ch},\phi_{ch}^{-1}\psi)$ and $\mathsf{trans}'
    = (G',\mathsf{ch'},\phi_{ch'}^{-1}\psi)$ we have \[ \psi =
    \frac{\mathsf{resp}-\mathsf{resp}'}{\phi_{ch}^{-1}-\phi_{ch'}^{-1}} \] which
    means that the witness can be extracted with probability 1.

    For special HVZK: given $\mathsf{ch}\in\{0,\dots,2^{130}-1\}$ choose
    $\mathsf{resp}\xleftarrow{\$}\mathcal{I}_{1107}$ and calculate
    $G'\leftarrow\mathsf{resp}(G_{ch})$. The distributions of real transcripts
    and simulated transcripts are the same. A given valid transcript occurs with
    probability $1/2^{130}$.

    \item $\mathsf{ID}_{\mathrm{CGI2}}$ can be used for authentication if a
      client (prover) proves to a server (verifier) the possession of a password
      without actually revealing it. The client shares a commitment with the
      server and as soon as the client wants to log-in, it receives a challenge
      from the server. If the client can successfully pass the challenge (i.e.,
      the response from the client is equal to the commitment), it is
      authenticated with the server.

      The advantage of such a scheme over conventional password-based
      authentication is that the secret is never transmitted to anyone.
      Futhermore, the commitment is also not vulnerable to dictionary attacks,
      as is common with stored password hashes on the server's side.

    \item \TODO

    \item The signer calculates a commitment with a predefined soundness error.
      Then the signer calculates the challenge by taking the hash of the message
      to be signed and the commitment. Afterwards, it will run the protocol
      again and calculate a response for the created challenge (hash) and the
      commitment. The signature is a tuple of the commitment and the response.

      The verifier can calculate the challenge on its own from the message and
      the commitment and then verifies that the response matches the commitment
      for that challenge. If it does, the signature is valid, otherwise it is
      invalid.

      The signature is $\mathsf{EUF}$-$\mathsf{CMA}$ secure if
      $\mathsf{ID}_{\mathrm{CGI2}}$ satisfies special soundness and honest
      verifier zero-knowledge, which it does. Futhermore, it is secure if the
      attacker has a negligible probability of finding a valid signature for a
      message which has not been queried before. This rests on the fact that
      finding an isomorphism for a specific commitment and challenge which
      matches the response is hard.

    \item The size of the signature comprises the commitment, which is a hash,
      and the response. The hash function is chosen to be $\unit[256]{bits}$ and
      the response is $\lceil\log_2 1107\rceil\cdot 1107 = \unit[12.177]{kbit} =
      \unit[1522.125]{bytes}$. In total, the signature is
      $\unit[1554.125]{bytes}$ big.

    \item The signature can be made smaller if the underlying graphs have less
      vertices. The signature shrinks linearly with the number of vertices.

  \end{enumerate}

\end{enumerate}

\end{document}