\documentclass[a4paper]{article} \usepackage[english]{babel} \usepackage{amsmath,amssymb,amsthm} \usepackage{color} \usepackage{units} \newcommand{\TODO}{\textcolor{red}{TO DO}} \begin{document} \begin{center} \textbf{\Large NWI-IMC061 -- Applied Cryptography}\\[4pt] \textbf{\large Final Exam, Academic Year 2021--2022} \end{center} \bigskip \hrule \bigskip \noindent \textbf{Last Name:} Eidelpes \medskip\noindent \textbf{First Name:} Tobias \medskip\noindent \textbf{Student Number:} s1090746 \medskip\noindent \textbf{Personalized Appendix Sequence Number:} 30 \bigskip \hrule \bigskip \begin{enumerate} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%% SYMMETRIC - LITERATURE %%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item \textbf{(18 points)} \begin{enumerate} \item EWCDM stands for \emph{Encrypted Wegman-Carter with Davies-Meyer}. As the name implies, EWCDM is based on a Wegman-Carter construction which takes the hash of a message $M$ and XORes it with the application of a pseudorandom function (PRF) to a nonce $N$. This construction is very efficient and also has a strong security bound. However, it is very vulnerable to \emph{nonce-misuse}. To deal with that problem, the Wegman-Carter construction is wrapped by another call to the PRF with a different key. Another disadvantage is the fact that PRFs are hard to get by and instead pseudorandom permutations are used. If a pseudorandom permutation (i.e. block cipher) is used, the security bound of the construction drops to the birthday bound ($2^{n/2}$). The authors replace the inner call to the PRF with the \emph{Davies-Meyer} construction \[ \mathrm{DM}[E]_K(N) = E_K(N)\oplus N \] and then encrypt that (with the hashed message) in another call to the block cipher. The resulting EWCDM construction looks like this \[ E_{K'}(E_K(N)\oplus N\oplus H_{K_h}(M)) \] and is secure \emph{beyond} the birthday bound against nonce-respecting adversaries while still offering birthday bound security against nonce-misusing adversaries. \item The type of symmetric cryptographic scheme introduced is a Message Authentication Code (MAC). \item The size of the key(s) depends on the block cipher and the keyed hash function. In total there likely need to be two distinct keys for the block cipher calls and one key for the hash function. \item Since EWCDM is based on a block cipher and a hash function and because those usually operate on fixed-length inputs, the construction also operates on fixed-length inputs. Messages come in variable-length sizes and need to be padded by the block cipher to the specified block size. \item Depending on the amount of input blocks, the construction will generate multiples of the block size as outputs. The outputs are variable-length. \item EWCDM is based on a pseudorandom permutation (i.e. block cipher) and an almost xor-universal (AXU) hash function (one-way function). \item Yes, the authors delivered a security proof. The proof assumes that the encryption function $E$ is a secure pseudorandom permutation for the case of a nonce-misusing adversary. This requirement on the security of $E$ is not present if the adversary is nonce-respecting. Additionally, the distinguisher is computationally unbounded and never repeats a query. \item The practical relevance is high, in my opinion. This is due to the fact that the EWCDM construction is secure against nonce-misusing adversaries up to the birthday bound. It has been shown that implementing nonces securely is a difficult task. If a scheme is easily broken by wrong handling of nonces, there is no \emph{fallback} security guarantee. The EWCDM construction, however, provides such a \emph{fallback} security guarantee and is of high practical relevance. \item Poly1305 is also a message authentication code (MAC), which we discussed in the lecture. \item One advantage of EWCDM over Poly1305 is that the former is nonce-misuse resistant up to the birthday bound while Poly1305 is not. \item One disadvantage of EWCDM is that it requires two calls to the underlying block cipher. This can have potentially serious performance implications for small, low-resource embedded devices. \end{enumerate} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%% SYMMETRIC - KEYED %%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item \textbf{(16 points)} \begin{enumerate} \item $\mathsf{CrAp}_K^{-1}$ operates by taking the ciphertexts $C_1,\cdots,C_l$ and passing them to the decryption function $\widetilde{E}^{-1}(K,N,\cdot)$. The decryption function takes 128-bit inputs and produces a 128-bit output. The output has to be stripped of the counter (the last 26 bits) to obtain the 102-bit message block $M_1,\cdots,M_l$. Finally, the padding (if any) has to be removed from $M_1,\cdots,M_l$ to obtain the original message block (102 bits). \item The length of the message $M$ is limited by the counter, which is at most 26 bits long. Since the very first counter ($\langle 0\rangle_{26}$) is reserved for the tag, $2^{26}-2$ message blocks remain. Every block (without the counter) is at most 102 bits long which gives a maximum message length of $102\cdot (2^{26}-2) = \unit[6845103924]{bits}$. \item $\widetilde{E}$ should behave like a pseudorandom permutation in order to be able to prove the security of $\mathsf{CrAp}$. If it does not, a distinguisher is able to gain a significant advantage because the block cipher does not actually generate \emph{random} outputs. Further, if the security of the underlying primitive is broken, the whole scheme falls apart. \item \TODO \item \TODO \item The length of the random nonce $N$ is $\unit[96]{bits}$. The expected number of evaluations an attacker has to make to obtain a repeated nonce is $2^{96/2} = 2^{48}$. \item After $2^b = 2^{62}$ forgery attempts, the attacker has exhausted the keyspace of the tag because the tag $T$ is of size $\unit[62]{bits}$. The distinguisher checks continuously if the current tag matches the ciphertext. If it does not, the tag is incremented by one until $2^{62}$ queries have been made. Eventually, the distinguisher will get the valid tag and is then able to identify if it is in the real world or in the ideal world. \item \TODO \end{enumerate} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%% SYMMETRIC - UNKEYED %%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item \textbf{(16 points)} \begin{enumerate} \item The chaining value size is $\unit[101]{bits}$ ($=g$) and the message block size is $655-101=\unit[554]{bits}$. \item If the message is of size $|M|=\unit[1234567]{bits}$ and the block size is $\unit[554]{bits}$, we need $\frac{1234866}{554}=2229$ blocks with a padding of $1234866-1234567=\unit[299]{bits}$. Additionally, the message length is also encoded in a $\unit[554]{bit}$ block and so the total number of blocks is $2230$. The total number of blocks corresponds to the total number of evaluations needed, which is $2230$ evaluations of $P$. \item In order for $(x,y)$ to be a valid preimage for the compression function $F^P$, $x$ must be of size $\unit[800]{bits}$ and contain 55 zeros at the beginning and 90 zeros at the end. The 655 bits in-between can be modified by an adversary to achieve the required target. Similarly, $y$ must be of size $\unit[800]{bits}$ where the first 50 and the last 649 bits are discarded. The bits in-between must be 101 zeros to satisfy our target image. Furthermore, the following condition must be true to achieve a valid preimage: $[F^P(x)=y]$ where $x$ and $y$ satisfy the aforementioned conditions. \item If the adversary makes one forward query, the probability that it hits the target image is $1/2^{g} = 1/2^{101}$. The adversary wants to find a $\unit[655]{bit}$ input to map to 101 zeros. Therefore, the whole search space is $2^{101}$ and the probability with one query is $1/2^{101}$. \item \TODO \item \TODO \end{enumerate} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%% ASYMMETRIC - LITERATURE %%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item \textbf{(17 points)} \begin{enumerate} \item LEDAcrypt is a post-quantum asymmetric suite of cryptosystems. It contains a public-key encryption scheme and a key-encapsulation mechanism (KEM). The underlying hard problem (arbitrary linear binary code decoding) is currently believed to be secure against quantum adversaries. \item The authors introduce a post-quantum public-key cryptosystem based on linear codes. \item IND-CCA2 is proven for both the KEM and the PKC. IND-CPA is proven for the KEM. \item LEDAcrypt is based on the hardness of the decoding problem for linear codes. Given a parity-check matrix $H$ and a received codeword $y$, the syndrome is $s=yH$. The best estimate for the received codeword is $x=y+z_0$. Find a minimum-weight solution $z_0$ for the equation $s=zH$. Finding a minimum-weight solution to $s=zH$ given $s$ and $H$ is $\mathsf{NP}$-hard. \item The private key in LEDAcrypt consists of two binary matrices $Q$ and $H$. The public key is constructed from the matrix $L=Q\cdot H$. The security of the scheme relies on the fact that obtaining the original information from a perturbed codeword is hard unless the factorization of the public key ($Q\cdot H$) is known. If the aforementioned problem of decoding linear codes has a polynomial-time solution, an attacker will also easily be able to obtain the factorization of the public key. If that was possible, the scheme would be broken. \item The strongest type of security the authors claim to achieve is IND-CCA2. The authors use the Fujisaki-Okamoto transform to achieve IND-CCA2 security. \item The scheme can be used to exchange symmetric keys between parties with the usage of the key encapsulation mechanism (KEM). In that scenario, the sender encrypts a symmetric key with LEDAcrypt and shares the encrypted key with the other party. The other party then decrypts the message to obtain the symmetric key which can be used for further communication. \item The lowest security level treated by the authors is level 1 of the NIST security levels corresponding to AES-128. The parameters depend on whether the scheme is used for ephemeral or long-term keys and what kind of code rate ($n_0$) is needed. For ephemeral keys with $n_0=2$ the authors suggest values of: $p=14,939$, $t=136$, $d_v=11$ and $m=[4,3]$. For long-term keys the authors suggest values of: $p=35,899$, $t=136$, $d_v=9$, $m=[5,4]$, $\overline{t}=4$ and $b_0=44$. These parameters are chosen with respect to an adversary using Information Set Decoding (ISD) to find a solution to the underlying hard problem. \item The size for ephemeral keys is $\unit[452]{bytes}$ (in memory) for the private key and $\unit[1872]{bytes}$ for the public key. The size for long-term keys is $\unit[468]{bytes}$ (in memory) for the private key and $\unit[4488]{bytes}$ for the public key. \item Kyber512 is also a KEM and achieves the same level of (classical) security. \item One advantage of LEDAcrypt is that the key sizes are relatively small compared to Classic McEliece, for example. Small key sizes are important for transmission of public keys so that they can fit in commonly used packet sizes. \item One disadvantage of the scheme is that it inherently has a non-zero decoding failure rate (DFR). For ephemeral keys and the lowest security level, the authors advertise an error probability of $14$ out of $1.2\cdot 10^9$ decodes. The DFR can be lowered by choosing different parameters, but the rate is arguably still too high for practical use. For long term keys the authors state that 95 out of 100 keys (lowest security level) provide a DFR of $2^{-64}$, which is also arguably too low for extended use well into the future. \end{enumerate} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%% ASYMMETRIC - SECURITY %%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \item \textbf{(33 points)} \begin{enumerate} \item Let there be an adversary $\mathcal{A}$ which breaks CGI. We can then construct an adversary $\mathcal{B}$ which breaks CGI2. Suppose $\mathcal{B}$ is given a CGI2 instance $(\mathcal{G}_a,\mathcal{G}_b)$ where $a\neq b$ and $\mathcal{G}_a$ and $\mathcal{G}_b$ are in the set of $2^{130}$ graphs isomorphic to $\mathcal{G}$. The goal of $\mathcal{B}$ is to find an isomorphism $\phi$ with non-negligible advantage such that $\mathcal{G}_a = \phi(\mathcal{G}_b)$. $\mathcal{B}$ will give $(\mathcal{G}_a,\mathcal{G}_b)$ to $\mathcal{A}$ and $\mathcal{A}$ will output an isomorphism $\phi$ which satisfies $\mathcal{G}_a = \phi(\mathcal{G}_b)$. $\mathcal{B}$ can then take this isomorphism and apply it to its own problem to obtain the solution. \item First, the prover takes a random isomorphism and generates a permutation of the given graph $\mathcal{G}$. The resulting graph is the commitment which is sent to the verifier. The verifier then picks a random graph from the set of graphs isomorphic to $\mathcal{G}$ and sends it to the prover. The prover takes this graph and calculates the permutation needed to arrive at the original graph $\mathcal{G}$. This is the response which is sent to the verifier. The verifier can then use the response to check if the graph it picked earlier (in the challenge) is actually isomorphic to $\mathcal{G}$. If it is, the verifier accepts, otherwise it rejects. \item The domain of the commitment scheme is the set of graphs isomorphic to $\mathcal{G}$ and the range is the number ($2^{130}$) of isomorphic graphs. The scheme consists of three phases: setup, commitment and opening. The setup phase consists of choosing an appropriate random permutation $\psi$ from the set of isomorphisms on $\mathcal{G}$. The commitment phase takes the isomorphism $\psi$ and the graph $\mathcal{G}$ as input and produces a commitment $\mathcal{G}'$. The opening phase takes an isomorphism $\mathsf{resp}$ and another graph $\mathcal{G}_{\mathsf{ch}}$ isomorphic to $\mathcal{G}$ as well as the original commitment as input and outputs $\top$ if the result matches $\mathcal{G}'$ and $\bot$ otherwise. \item Computational binding: Suppose $\mathsf{Comm}(\psi,\mathcal{G}_0) = \mathsf{Comm}(\psi,\mathcal{G}_1)$. This means that $\psi(\mathcal{G}_0) = \psi(\mathcal{G}_1)$ and the adversary has found an isomorphism which maps two different graphs to the same output which corresponds to solving the CGI problem. \item If $G_{ch}=\phi_{ch}(G)$ and $G'=\psi(G)$, it follows that $G=\phi_{ch}^{-1}(G_{ch})$ and therefore $G'=\psi(\phi_{ch}^{-1}(G_{ch}))$ so the verifier will always accept. \item Suppose $G_{ch}$ is not isomorphic to $G$. $\mathcal{P}$ prepares in advance for a challenge $ch^*$ and so $G'=\psi(\phi_{ch^*}^{-1}(G_{ch^*}))$. $\mathcal{P}$ commits to $G'$. If the challenge by $V$ is $ch^*$ (so $ch=ch^*$), $\mathcal{V}$ accepts, otherwise it rejects. Because $ch\in\{0,\dots,2^{130}-1\}$, the probability that $\mathcal{P}$ convinces $\mathcal{V}$ is $1/2^{130}$ (soundness error). \item The soundness error after one iteration is $1/2^{130}$. To achieve a $1/2^{192}$ soundness error, the protocol should be done twice to arrive at a soundness error of $1/2^{260}$, which is well below the required $1/2^{192}$. \item A simulator $\mathcal{S}$ is built as follows: \begin{itemize} \item $\mathcal{S}$ starts $\mathcal{V}^*$ with $G_i$ and $i\in\{0,\dots,2^{130}-1\}$. \item $\mathcal{S}$ makes a guess $\mathsf{ch}^*$ and calculates $G'\leftarrow\psi(\phi_{ch^*}^{-1}(G_{ch^*}))$. \item $\mathcal{S}$ gets a challenge $\mathsf{ch}$ from $\mathcal{V}^*$. If $\mathsf{ch}=\mathsf{ch}^*$, $\mathcal{S}$ outputs $(G',\mathsf{ch}^*,\phi_{ch^*}^{-1}\psi)$. If $\mathsf{ch}\neq\mathsf{ch}^*$, $\mathcal{S}$ rewinds $\mathcal{V}^*$ and goes to step 2. \end{itemize} The simular $\mathcal{S}$ is expected probabilistic polynomial-time with $2^{130}n$ time and the protocol is zero-knowledge. \item For completeness see 5e. Special soundness: given two accepting transcripts for the same commitment $\mathsf{trans} = (G',\mathsf{ch},\phi_{ch}^{-1}\psi)$ and $\mathsf{trans}' = (G',\mathsf{ch'},\phi_{ch'}^{-1}\psi)$ we have \[ \psi = \frac{\mathsf{resp}-\mathsf{resp}'}{\phi_{ch}^{-1}-\phi_{ch'}^{-1}} \] which means that the witness can be extracted with probability 1. For special HVZK: given $\mathsf{ch}\in\{0,\dots,2^{130}-1\}$ choose $\mathsf{resp}\xleftarrow{\$}\mathcal{I}_{1107}$ and calculate $G'\leftarrow\mathsf{resp}(G_{ch})$. The distributions of real transcripts and simulated transcripts are the same. A given valid transcript occurs with probability $1/2^{130}$. \item $\mathsf{ID}_{\mathrm{CGI2}}$ can be used for authentication if a client (prover) proves to a server (verifier) the possession of a password without actually revealing it. The client shares a commitment with the server and as soon as the client wants to log-in, it receives a challenge from the server. If the client can successfully pass the challenge (i.e., the response from the client is equal to the commitment), it is authenticated with the server. The advantage of such a scheme over conventional password-based authentication is that the secret is never transmitted to anyone. Futhermore, the commitment is also not vulnerable to dictionary attacks, as is common with stored password hashes on the server's side. \item \TODO \item The signer calculates a commitment with a predefined soundness error. Then the signer calculates the challenge by taking the hash of the message to be signed and the commitment. Afterwards, it will run the protocol again and calculate a response for the created challenge (hash) and the commitment. The signature is a tuple of the commitment and the response. The verifier can calculate the challenge on its own from the message and the commitment and then verifies that the response matches the commitment for that challenge. If it does, the signature is valid, otherwise it is invalid. The signature is $\mathsf{EUF}$-$\mathsf{CMA}$ secure if $\mathsf{ID}_{\mathrm{CGI2}}$ satisfies special soundness and honest verifier zero-knowledge, which it does. Futhermore, it is secure if the attacker has a negligible probability of finding a valid signature for a message which has not been queried before. This rests on the fact that finding an isomorphism for a specific commitment and challenge which matches the response is hard. \item The size of the signature comprises the commitment, which is a hash, and the response. The hash function is chosen to be $\unit[256]{bits}$ and the response is $\lceil\log_2 1107\rceil\cdot 1107 = \unit[12.177]{kbit} = \unit[1522.125]{bytes}$. In total, the signature is $\unit[1554.125]{bytes}$ big. \item The signature can be made smaller if the underlying graphs have less vertices. The signature shrinks linearly with the number of vertices. \end{enumerate} \end{enumerate} \end{document}