cryptocurrencies/project1/report1.tex

\documentclass[12pt,a4paper]{article}

\usepackage[cm]{fullpage}
\usepackage{amsthm}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{xspace}
\usepackage[english]{babel}
\usepackage{fancyhdr}
\usepackage{titling}
\usepackage{hyperref}
\renewcommand{\thesection}{Exercise \Alph{section}:}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% This part needs customization from you %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\newcommand{\groupnumber}{04}
\newcommand{\name}{Tobias Eidelpes, Mehmet Ege Demirsoy, Nejra Komic}
\newcommand{\matriculation}{01527193, 01641187, 11719704}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%           End of customization         %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\newcommand{\projnumber}{1}
\newcommand{\Title}{Analysing the Blockchain}
\setlength{\headheight}{15.2pt}
\setlength{\headsep}{20pt}
\setlength{\textheight}{680pt}
\pagestyle{fancy}
\fancyhf{}
\fancyhead[L]{Cryptocurrencies - Project \projnumber\ - Analysing the Blockchain}
\fancyhead[C]{}
\fancyhead[R]{\name}
\renewcommand{\headrulewidth}{0.4pt}
\fancyfoot[C]{\thepage}


\begin{document}
\thispagestyle{empty}
\noindent\framebox[\linewidth]{%
 \begin{minipage}{\linewidth}%
 \hspace*{5pt} \textbf{Cryptocurrencies (WS2021/22)} \hfill Prof.~Matteo Maffei \hspace*{5pt}\\

 \begin{center}
  {\bf\Large Project \projnumber~-- \Title}
 \end{center}

 \vspace*{5pt}\hspace*{5pt} \hfill TU Wien \hspace*{5pt}
\end{minipage}%
}
\vspace{0.5cm}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section*{Group \groupnumber}
Our group consists of the following members:
\begin{center}
\textbf{\name} %please fill the information above

\matriculation %please fill the information above
\end{center}

\section{Finding invalid blocks}

For this exercise all invalid blocks contained in the database provided to us
had to be found. While there is an
official\footnote{\url{https://en.bitcoin.it/wiki/Protocol\_rules\#.22block.22\_messages}}
algorithm which allows network participants to verify whether a block is invalid
or not, the stripped-down version of the blockchain we received does not require
all the steps. This stripped-down version of the algorithm thus specifies which
constraints the data must satisfy:

\begin{enumerate}
    \item All blocks which do not have the coinbase transaction as their first
        transaction are invalid. This will be achieved by creating a view which
        lists all coinbase transactions. Then we query the database for all
        first transactions of each block and check if that transaction is in the
        view of all coinbase transactions. If it is not, we reject the block and
        add it to the invalid list.
    \item All blocks which contain transactions which do not have inputs or
        outputs are invalid. We split this task into two queries, one for
        checking if a block contains transactions with zero inputs and another
        one for checking if a block contains transactions with zero outputs.
    \item All blocks which have transactions with an invalid output value or
        where the sum of all output values exceeds the legal money range are
        invalid. This task is split into two queries as well. One for checking
        if individual output values are outside of the legal money range and a
        second one for checking if the sum of all output values per transaction
        is outside of the legal money range.
    \item Reject all blocks which have transactions with inputs that do not have
        a corresponding output. For this task we first create a view which finds
        all non coinbase transactions. The output of that query is then filtered
        for all inputs which are not part of a coinbase transaction (so the non
        coinbase inputs). Finally, the non coinbase inputs are joined with the
        outputs and rows containing \texttt{NULL} as their \texttt{value}
        indicate an invalid block.
    \item All blocks which contain transactions where the input's
        \texttt{sig\_id} field is not the same as the output's \texttt{pk\_id}
        field are invalid. Since we are not interested in the coinbase
        transactions, the query uses the non coinbase inputs again to join them
        with the outputs. If the two fields do not match, the block is invalid.
    \item All blocks which have inputs for which there exist outputs which have
        already been spent are invalid. This task is split into three queries.
        First, we find all outputs which have more than one input. Second, for
        all the outputs found, we find the corresponding inputs where the output
        was first spent. Third, the two tables are combined such that blocks
        with outputs which have corresponding inputs that are not listed as the
        first spending occurrence, are marked as invalid.
    \item All blocks containing inputs which are not in the legal money range
        are invalid. First, we construct a view which gathers all transactions
        and their corresponding sum of value for all inputs. All blocks
        containing input sums which are outside of the legal money range are
        marked as invalid. Second, we reuse the view of all non coinbase inputs
        and filter them for the ones which have an output value outside of the
        legal money range.
    \item All blocks where the sum of input values is smaller than the sum of
        output values are invalid. This task allows us to reuse the view created
        earlier of all input sums. Additionally, the sum of output values is
        obtained similarly to the input sums. After joining both input sums and
        output sums, we can filter for blocks which have smaller input sums than
        output sums. Those blocks are invalid.
    \item All blocks where the coinbase value is larger than the sum of the
        block creation fee and all transaction fees are invalid. This task is
        split into four queries. First, we create a view which shows all block
        ids and their coinbase values. Second, we need to know the sum of all
        input values per block. Third, we repeat that query for the sum of the
        output values per block. Lastly, these three tables are joined and all
        blocks which satisfy the constraint are invalid.
\end{enumerate}

Finally, the invalid blocks are written to the \texttt{invalid\_blocks} table
and all duplicates are removed.

\section{UTXOs}
In this exercise we were given a smaller data set in comparison to the first exercise and we were expected to work on unspent transaction outputs which have the following constraint:
\begin{enumerate}
\item A transaction output is unspent if it is not used as an input to a later transaction.
\end{enumerate}
The exercise further has the following constraints:

\begin{enumerate}
\item The table \texttt{utxos} with columns \texttt{output\_id} and \texttt{value} should contain all UTXOs as of the last block of the data set. For this constraint we need to filter out the outputs from \texttt{outputs} table, whose \texttt{output\_id}'s are not referenced in the \texttt{inputs} table. Thus we need a \texttt{WHERE NOT EXISTS} clause for the filtering.

\item The table \texttt{number\_of\_utxos} with column \texttt{utxo\_count} should contain as single entry the total
number of UTXOs. For implementing the solution of this constraint, we just need to count the number of \texttt{output\_id} present in the \texttt{utxos} table from the previous constraint's implementation. \texttt{COUNT(output\_id)} clause here is sufficient.

\item The table \texttt{id\_of\_max\_utxo} with the column \texttt{max\_utxo} should contain as single entry the id of
the UTXO with the highest associated value. For getting the highest valued utxo, we need to order the \texttt{utxos} table in descending manner by the values. This would ensure that we have the highest valued utxo as the first entry. Thus by adding \texttt{LIMIT 1} clause, we get the top entry from the ordered results.
\end{enumerate}

With each constraint, we insert the expected results into the given tables.


\section{De-anonymization}
In this exercise, a de-anonymization attempt was expected using the following two heuristics:

\begin{enumerate}
\item Joint control: addresses used as inputs to a common transaction are controlled by the same
entity.

\item Serial control: the output address of a transaction with only a single input and output is usually
controlled by the same entity owning the input addresses.
\end{enumerate}

First part of this exercise was to insert all pairs of addresses into the table \texttt{addressRelations} satisfying the 2 constraints above. For this we first create 2 views each representing respectively the transactions that satisfy the above constraints. After that we use these tables to find pairs of addresses by performing (multiple) joins with \texttt{inputs} table and then insert the result into a temporary table called \texttt{tempRelations}. Since result contains reflexive and symmetrical pairs, we further up define additional queries to delete these pairs from the \texttt{tempRelations} table. With reflexive and symmetrical pairs deleted, we insert the \texttt{tempRelations} pairs into \texttt{addressRelations}, which concludes the first part.
\\
For the second part, the function \texttt{clusterAddresses()} was provided for clustering the address pairs into entities with (artificial) ids. This function then returned a table with entity ids and the addresses belonging to these entities. First step is to save the results of the function into a temporary table. After this, following constraints have to be satisfied:

\begin{enumerate}
\item The table \texttt{max\_value\_by\_entity} with column \texttt{value} should contain as single entry the
maximum total value of (unspent) satoshis controlled by one cluster (one entity). To make our job easier, we save all the utxos with addresses, transaction ids, output\_ids and values into a temporary table. We can then use this table in a join query with the table containing clusters by addresses. We then group entries by the entity ids and perform built-in \texttt{SUM} function on values and then call another built-in \texttt{MAX} function on the query result to obtain the maximum value.
\\
(Note: In our solution for readability purposes, we save the \texttt{SUM} results into a temporary table and query this table when we are querying for the max value.)

\item The table \texttt{min\_addr\_of\_max\_entity} with column \texttt{addr} should contain as single entry the
(numerically) lowest address of the cluster (the entity) controlling the most total (unspent)
bitcoins. To solve this, we first create a temporary table called \texttt{temp\_max\_entity} containing entity id, address and utxo values of all the addresses of the entity with maximum utxo value from constraint 1. Then we use this table to filter out all the addresses of this entity from cluster table, saving it into yet another temporary table called \texttt{max\_entity\_all\_addresses}. As last step, we perform a \texttt{MIN} query on \texttt{max\_entity\_all\_addresses} to satisfy the constraint.

\item The table \texttt{max\_tx\_to\_max\_entity} with column \texttt{tx\_id} should contain as single entry the
transaction sending the greatest number of bitcoins to the cluster (the entity) controlling
the most total (unspent) bitcoins. We start by creating a temporary table called \texttt{max\_tx\_value\_to\_max\_entity}. The goal here is to save the value of the transaction, which sends the most amount of coins to an address of the max entry. For this query, we need to provide transaction id by joining \texttt{outputs} and \texttt{max\_entities\_all\_addresses}. We construct this join query in a \texttt{WITH..AS} clause called \texttt{max\_entity\_join\_outputs}. After that we use the same join query \texttt{max\_entity\_join\_outputs}, but we additionally filter the result by the value in \texttt{max\_tx\_value\_to\_max\_entity}, thus this leaves us with the desired transaction with the max value. The transaction id of this transaction is then inserted into the given table.
\end{enumerate}

\section*{Work distribution}
%Fill in here an overview on which group member participated in which task and to which extent

\begin{description}
    \item[Tobias Eidelpes] Code and report for Exercise A.
    \item[Ege Mehmet Demirsoy] Code and report for Exercise C.
    \item[Nejra Komic] Code and report for Exercise B.
\end{description}

\end{document}