Review text so far, add titlepage and erklaerung

This commit is contained in:
Tobias Eidelpes 2020-03-31 17:49:05 +02:00
parent 50e9bdac73
commit 3eb33ae783
5 changed files with 100 additions and 57 deletions

6
.gitignore vendored
View File

@ -18,3 +18,9 @@ main.pdf
main.run.xml main.run.xml
main.synctex.gz main.synctex.gz
main.toc main.toc
main.acr
main.alg
main.glg
main.gls
main.ilg
main.ind

29
chapters/erklaerung.tex Normal file
View File

@ -0,0 +1,29 @@
\documentclass[../main.tex]{subfiles}
\begin{document}
\chapter*{Erklärung zur Verfassung der Arbeit}
\textsf{Tobias Eidelpes} \\
Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich
die verwendeten Quellen und Hilfsmittel vollständig angegeben habe und dass
ich die Stellen der Arbeit---einschließlich Tabellen, Karten und Abbildungen---,
die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen
sind, auf jeden Fall unter Angabe der Quelle als Entlehnung kenntlich gemacht habe.
\vspace{2cm}
\bigskip
\begin{minipage}{0.55\textwidth}
\textsf{Wien, 31. März 2020} \\
\end{minipage}
\begin{minipage}{0.45\textwidth}
\begin{tabular}{c}
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ \\
\textsf{Tobias Eidelpes}
\end{tabular}
\end{minipage}
\end{document}

View File

@ -27,7 +27,7 @@ identifiers.
\section{Session-based Tracking Methods} \section{Session-based Tracking Methods}
\label{sec:session-based tracking methods} \label{sec:session-based tracking methods}
One of the simplest and most used forms of tracking on the Internet rely on One of the simplest and most used forms of tracking on the Internet relies on
sessions. Since HTTP is a stateless protocol, web servers cannot by default keep sessions. Since HTTP is a stateless protocol, web servers cannot by default keep
track of any previous client requests. In order to implement specific features track of any previous client requests. In order to implement specific features
such as personalized advertising, some means to save current and recall previous such as personalized advertising, some means to save current and recall previous
@ -94,12 +94,12 @@ web \cite{westMeasuringPrivacyDisclosures2014}.
\subsection{Hidden Form Fields} \subsection{Hidden Form Fields}
\label{subsec:hidden form fields} \label{subsec:hidden form fields}
The \gls{HTML} provides a specification for form elements, which allow users to The \gls{HTML} provides a specification for form elements, which allows users to
submit information (e.g., for authentication) to the server via POST or GET submit information (e.g., for authentication) to the server via POST or GET
methods. Normally, a user would input data into a form and on clicking methods. Normally, a user would input data into a form and on clicking
\emph{submit} the input would be sent to the server. Sometimes it is necessary \emph{submit} the input would be sent to the server. Sometimes it is necessary
to include additional information that the user did not enter. For this reason to include additional information that the user did not enter. For this reason
there exist \emph{hidden} web forms. Hidden web forms do not show on the website there exist \emph{hidden} web forms. Hidden web forms do not show on the web site
and therefore the user cannot enter any information. Similar to \gls{URL} and therefore the user cannot enter any information. Similar to \gls{URL}
parameters, the value parameter in a hidden field contains additional parameters, the value parameter in a hidden field contains additional
information like the user's preferred language for example. Since almost information like the user's preferred language for example. Since almost
@ -126,7 +126,7 @@ is sent to the server along with the data the user has filled in.
\subsection{HTTP Referer} \subsection{HTTP Referer}
\label{subsec:http referer} \label{subsec:http referer}
Providers of web services often want to know where visitors to their website Providers of web services often want to know where visitors to their web site
come from to understand more about their users and their browsing habits. The come from to understand more about their users and their browsing habits. The
\gls{HTTP} specification accounts for this by introducing the \emph{\gls{HTTP} \gls{HTTP} specification accounts for this by introducing the \emph{\gls{HTTP}
Referer field} [\emph{sic}] in the header. By checking the referrer, the server Referer field} [\emph{sic}] in the header. By checking the referrer, the server
@ -147,7 +147,7 @@ identifiability of users on the web.
\label{subsec:explicit authentication} \label{subsec:explicit authentication}
Explicit authentication requires a user to \emph{explicitly} log in or register Explicit authentication requires a user to \emph{explicitly} log in or register
to the website. This way, specific resources are only available to the user when to the web site. This way, specific resources are only available to the user when
he or she has authenticated themselves to the service. Actions taken on an he or she has authenticated themselves to the service. Actions taken on an
authenticated user account are tied to that account and crafting a personal authenticated user account are tied to that account and crafting a personal
profile is more or less a built-in function in this case. Since merely asking a profile is more or less a built-in function in this case. Since merely asking a
@ -167,12 +167,12 @@ efforts are not detected by the average user \cite{}, it is known that actions
taken on an account are logged to provide better service through service taken on an account are logged to provide better service through service
optimization and profile personalization. optimization and profile personalization.
Making an account on a website to use their services to their full extent, can Making an account on a web site to use their services to their full extent, can
be beneficial in some cases. Facebook for example, allows their users to be beneficial in some cases. Facebook for example, allows their users to
configure what they want to share with the public and their friends. Research configure what they want to share with the public and their friends. Research
has shown however, that managing which posts get shown to whom is not as has shown however, that managing which posts get shown to whom is not as
straightforward as one might think. straightforward as one might think.
\todo{Wrong chapter?} \citeauthor{liuAnalyzingFacebookPrivacy2011} \citeauthor{liuAnalyzingFacebookPrivacy2011}
\cite{liuAnalyzingFacebookPrivacy2011} conducted a survey where they asked \cite{liuAnalyzingFacebookPrivacy2011} conducted a survey where they asked
Facebook users about their desired privacy and visibility settings and Facebook users about their desired privacy and visibility settings and
cross-checked them with the actual settings they have used for their posts. The cross-checked them with the actual settings they have used for their posts. The
@ -236,11 +236,11 @@ A method which is most often associated with tracking on the Internet is
tracking with \gls{HTTP} cookies. Cookies are small files that are placed in the tracking with \gls{HTTP} cookies. Cookies are small files that are placed in the
browser's storage on the user's computer. They are limited to four kilobytes in browser's storage on the user's computer. They are limited to four kilobytes in
size and are generally used to identify and authenticate users and to store size and are generally used to identify and authenticate users and to store
website preferences. They were introduced to the web to allow stateful web site preferences. They were introduced to the web to allow stateful
information to be stored because the \gls{HTTP} is a stateless protocol and information to be stored because the \gls{HTTP} is a stateless protocol and
therefore does not have this capability. It is also a way of reducing the therefore does not have this capability. It is also a way of reducing the
server's load by not having to recompute states every time a user visits a server's load by not having to recompute states every time a user visits a
website. Shopping cart functionality for example can thus be implemented by web site. Shopping cart functionality for example can thus be implemented by
setting a cookie in the user's browser, saving the items which are currently setting a cookie in the user's browser, saving the items which are currently
added to the shopping cart and giving the user the possibility to resume added to the shopping cart and giving the user the possibility to resume
shopping at a later point provided that they do not delete their cookies. With shopping at a later point provided that they do not delete their cookies. With
@ -279,7 +279,7 @@ soon as the session is `torn down'. By adding an expiration date (demonstrated
in Listing~\ref{lst:permanent cookie header}) or a maximum age, the cookie in Listing~\ref{lst:permanent cookie header}) or a maximum age, the cookie
becomes permanent. Additionally, the domain attribute can be specified, meaning becomes permanent. Additionally, the domain attribute can be specified, meaning
that cookies which list a different domain than the origin, are rejected by the that cookies which list a different domain than the origin, are rejected by the
user agent \cite[Section 4.1.2.3]{barthHTTPStateManagement2011}. The same-origin user agent \cite[section 4.1.2.3]{barthHTTPStateManagement2011}. The same-origin
policy applies to cookies, disallowing access by other domains. policy applies to cookies, disallowing access by other domains.
\begin{listing} \begin{listing}
@ -308,7 +308,7 @@ Additionally, a length of more than 35 characters in the value field applies to
80\% of non-tracking cookies. \emph{Cookie Chunking}, where a cookie of larger 80\% of non-tracking cookies. \emph{Cookie Chunking}, where a cookie of larger
length is split into multiple cookies with smaller length, did not appear to length is split into multiple cookies with smaller length, did not appear to
affect detection by their method negatively. They also present a site affect detection by their method negatively. They also present a site
measurement of the Alexa Top 10,000 websites, finding that 46\% of websites use measurement of the Alexa Top 10,000 web sites, finding that 46\% of web sites use
third party tracking. More recent research third party tracking. More recent research
\cite{gonzalezCookieRecipeUntangling2017} has shown that tracking cookies do not \cite{gonzalezCookieRecipeUntangling2017} has shown that tracking cookies do not
have to be long lasting to accumulate data about users. Some cookies---like the have to be long lasting to accumulate data about users. Some cookies---like the
@ -332,7 +332,7 @@ detect and block cookies (see chapter~\ref{chap:defences against tracking}).
\label{subsec:flash cookies and java jnlp persistenceservice} \label{subsec:flash cookies and java jnlp persistenceservice}
Flash Cookies are similar to HTTP cookies in that they too are a store of Flash Cookies are similar to HTTP cookies in that they too are a store of
information that helps websites and servers to recognize already seen users. information that helps web sites and servers to recognize already seen users.
They are referred to as \glspl{LSO} by Adobe and are part of the Adobe Flash They are referred to as \glspl{LSO} by Adobe and are part of the Adobe Flash
Player runtime. Instead of storing data in the browser's storage, they have Player runtime. Instead of storing data in the browser's storage, they have
their own storage in a different location on the user's computer. Another their own storage in a different location on the user's computer. Another
@ -352,11 +352,11 @@ posed by \gls{HTTP} cookies and reacted by taking countermeasures.
\citeauthor{soltaniFlashCookiesPrivacy2009} \citeauthor{soltaniFlashCookiesPrivacy2009}
\cite{soltaniFlashCookiesPrivacy2009} were the first to report on the usage of \cite{soltaniFlashCookiesPrivacy2009} were the first to report on the usage of
Flash cookies by advertisers and popular websites. While surveying the top 100 Flash cookies by advertisers and popular web sites. While surveying the top 100
websites at the time, they found that 54\% of them used Flash cookies. Some web sites at the time, they found that 54\% of them used Flash cookies. Some
websites were setting Flash cookies as well as \gls{HTTP} cookies with the same web sites were setting Flash cookies as well as \gls{HTTP} cookies with the same
values, suggesting that Flash cookies serve as backup to \gls{HTTP} cookies. values, suggesting that Flash cookies serve as backup to \gls{HTTP} cookies.
Several websites were found using Flash cookies to respawn already deleted Several web sites were found using Flash cookies to respawn already deleted
\gls{HTTP} cookies, even across domains. \citeauthor{acarWebNeverForgets2014} \gls{HTTP} cookies, even across domains. \citeauthor{acarWebNeverForgets2014}
\cite{acarWebNeverForgets2014} automated detecting Flash cookies and access to \cite{acarWebNeverForgets2014} automated detecting Flash cookies and access to
them by monitoring file access with the GNU/Linux \emph{strace} tool them by monitoring file access with the GNU/Linux \emph{strace} tool
@ -367,10 +367,10 @@ top 100 sites use Flash cookies for respawning.
Even though Flash usage has declined during the last few years thanks to the Even though Flash usage has declined during the last few years thanks to the
development of the HTML5 standard, \citeauthor{buhovFLASH20thCentury2018} development of the HTML5 standard, \citeauthor{buhovFLASH20thCentury2018}
\cite{buhovFLASH20thCentury2018} have shown that despite major security flaws, \cite{buhovFLASH20thCentury2018} have shown that despite major security flaws,
Flash content is still served by 7.5\% of the top one million websites (2017). Flash content is still served by 7.5\% of the top one million web sites (2017).
The W3Techs Web Technology Survey shows a similar trend and also offers an The W3Techs Web Technology Survey shows a similar trend and also offers an
up-to-date measurement of 2.7\% of the top ten million websites for the year up-to-date measurement of 2.7\% of the top ten million web sites for the year
2020 \cite{w3techsHistoricalYearlyTrends2020}. Due to the security concerns in 2020 \cite{w3techsHistoricalYearlyTrends2020}. Due to the security concerns with
using Flash, Google's popular video sharing platform YouTube switched by default using Flash, Google's popular video sharing platform YouTube switched by default
to the HTML5 <video> tag in January of 2015 to the HTML5 <video> tag in January of 2015
\cite{youtubeengineeringYouTubeNowDefaults2015}. In 2017 Adobe announced that they \cite{youtubeengineeringYouTubeNowDefaults2015}. In 2017 Adobe announced that they
@ -389,13 +389,13 @@ injecting a Java applet into the \gls{DOM} of a page
\subsection{Evercookie} \subsection{Evercookie}
\label{subsec:evercookie} \label{subsec:evercookie}
Evercookie is JavaScript code that can be embedded in websites which allows to Evercookie is JavaScript code that can be embedded in web sites which allows to
permanently store information on the user's computer. When activated, permanently store information on the user's computer. When activated,
information is not only stored in standard \gls{HTTP} cookies but also in information is not only stored in standard \gls{HTTP} cookies but also in
various other places, providing redundancy where possible. A full list of various other places, providing redundancy where possible. A full list of
locations used by Evercookie can be found on the project's github page locations used by Evercookie can be found on the project's github page
\cite{kamkarSamykEvercookie2020}. In case the user wants to get rid of all \cite{kamkarSamykEvercookie2020}. In case the user wants to get rid of all
information stored by visiting a website that uses evercookies, every location information stored by visiting a web site that uses evercookies, every location
has to be cleared because if one remains, all the other cookies are restored. has to be cleared because if one remains, all the other cookies are restored.
The cookie deletion mechanisms that are provided by browsers by default do not The cookie deletion mechanisms that are provided by browsers by default do not
clear all locations where evercookies are stored, which makes evercookie almost clear all locations where evercookies are stored, which makes evercookie almost
@ -422,7 +422,7 @@ ways to accurately match an accumulated profile history of one identifier to
another. This problem has been solved by modern trackers by using a mechanism another. This problem has been solved by modern trackers by using a mechanism
called Cookie Synchronization or Cookie Matching. This technique allows multiple called Cookie Synchronization or Cookie Matching. This technique allows multiple
trackers to open an information sharing channel between each other without trackers to open an information sharing channel between each other without
necessarily having to know the website the user visits. necessarily having to know the web site the user visits.
\begin{figure}[ht] \begin{figure}[ht]
\centering \centering
@ -436,14 +436,14 @@ An example of how Cookie Synchronization works in practice is given in
Figure~\ref{fig:cookie synchronization}. The two parties that are interested in Figure~\ref{fig:cookie synchronization}. The two parties that are interested in
tracking the user are called \emph{cloudflare.com} and \emph{google.com} in this tracking the user are called \emph{cloudflare.com} and \emph{google.com} in this
example. The user they want to track is called \emph{browser}. \emph{Browser} example. The user they want to track is called \emph{browser}. \emph{Browser}
first visits \emph{website1.com} which loads JavaScript from first visits \emph{web site1.com} which loads JavaScript from
\emph{cloudflare.com}. \emph{Cloudflare.com} sets a cookie in the browser with a \emph{cloudflare.com}. \emph{Cloudflare.com} sets a cookie in the browser with a
tracking identifier called \emph{userID = 1234}. Next, \emph{browser} visits tracking identifier called \emph{userID = 1234}. Next, \emph{browser} visits
another website called \emph{website2.com} which loads an advertisement banner another web site called \emph{web site2.com} which loads an advertisement banner
from \emph{google.com}. \emph{Google.com} also sets a cookie with the tracking from \emph{google.com}. \emph{Google.com} also sets a cookie with the tracking
identifier \emph{userID = ABCD}. \emph{Browser} has now two cookies from two identifier \emph{userID = ABCD}. \emph{Browser} has now two cookies from two
different providers, each of them knowing the user under a different identifier. different providers, each of them knowing the user under a different identifier.
When \emph{browser} visits a third website called \emph{website3.com} which When \emph{browser} visits a third web site called \emph{website3.com} which
makes a request to \emph{cloudflare.com} and recognizes the user with the makes a request to \emph{cloudflare.com} and recognizes the user with the
identifier \emph{userID = 1234}, \emph{cloudflare.com} sends an \gls{HTTP} identifier \emph{userID = 1234}, \emph{cloudflare.com} sends an \gls{HTTP}
redirect, redirecting \emph{browser} to \emph{google.com}. The redirect also redirect, redirecting \emph{browser} to \emph{google.com}. The redirect also
@ -481,7 +481,7 @@ top 1000 (46\%) use Cookie Synchronization with at least one other party.
parties. \citeauthor{papadopoulosExclusiveHowSynced2018} show in parties. \citeauthor{papadopoulosExclusiveHowSynced2018} show in
\cite{papadopoulosExclusiveHowSynced2018} the threat that Cookie Synchronization \cite{papadopoulosExclusiveHowSynced2018} the threat that Cookie Synchronization
poses to encrypted \gls{TLS} sessions by performing the cookie-syncing over poses to encrypted \gls{TLS} sessions by performing the cookie-syncing over
unencrypted \gls{HTTP} even though the original request to the website was unencrypted \gls{HTTP} even though the original request to the web site was
encrypted. This highlights the serious privacy implications for users of encrypted. This highlights the serious privacy implications for users of
\gls{VPN} services trying to safeguard their traffic from a potentially \gls{VPN} services trying to safeguard their traffic from a potentially
malicious \gls{ISP}. malicious \gls{ISP}.
@ -499,11 +499,11 @@ settings in the Silverlight application. Silverlight's Isolated Storage is one
of the methods evercookie (section~\ref{subsec:evercookie}) uses to make of the methods evercookie (section~\ref{subsec:evercookie}) uses to make
permanent deletion of cookies hard to do and to facilitate cookie respawning. permanent deletion of cookies hard to do and to facilitate cookie respawning.
Usage of Silverlight has seen a steady decline since 2011 even though it has Usage of Silverlight has seen a steady decline since 2011 even though it has
been used by popular video streaming websites such as Netflix been used by popular video streaming web sites such as Netflix
\cite{NetflixBeginsRollOut2010} and Amazon. Microsoft did not include \cite{NetflixBeginsRollOut2010} and Amazon. Microsoft did not include
Silverlight support in Windows 8 and declared end-of-life in a blog post for Silverlight support in Windows 8 and declared end-of-life in a blog post for
October of 2021 \cite{SilverlightEndSupport2015}. Usage of Silverlight currently October of 2021 \cite{SilverlightEndSupport2015}. Usage of Silverlight currently
hovers around 0.04\% for the top 10 million websites hovers around 0.04\% for the top 10 million web sites
\cite{w3techsUsageStatisticsSilverlight2020}. \cite{w3techsUsageStatisticsSilverlight2020}.
\subsection{HTML5 Web Storage} \subsection{HTML5 Web Storage}
@ -529,7 +529,7 @@ applications. Due to it violating the same-origin policy, most major browsers
have not implemented Global Storage. have not implemented Global Storage.
Local Storage does, however, obey the same-origin policy by only allowing the Local Storage does, however, obey the same-origin policy by only allowing the
originating domain access to its name-value pairs. Every website has their own originating domain access to its name-value pairs. Every web site has their own
separate storage area which maintains a clear separation of concerns. Local separate storage area which maintains a clear separation of concerns. Local
Storage lends itself for different use cases. Especially applications that Storage lends itself for different use cases. Especially applications that
should function even when no internet connection exists can use Local Storage to should function even when no internet connection exists can use Local Storage to
@ -558,7 +558,7 @@ tracking domains.
\label{subsec:html5 indexed database api} \label{subsec:html5 indexed database api}
The need for client side storage to provide performant web applications that can The need for client side storage to provide performant web applications that can
also function offline, has prompted the inception of alternative methods to also function offline has prompted the inception of alternative methods to
store and retrieve information. Consequently, the development of the HTML5 store and retrieve information. Consequently, the development of the HTML5
standard has tried to fill that need by introducing HTML5 Web Storage and the standard has tried to fill that need by introducing HTML5 Web Storage and the
HTML5 Indexed Database \gls{API}. HTML5 Indexed Database \gls{API}.
@ -618,7 +618,7 @@ section~\ref{subsec:evercookie}) to add another layer of redundancy for storing
unique identifiers and respawning deleted ones. By performing static analysis on unique identifiers and respawning deleted ones. By performing static analysis on
a dataset provided by the \gls{HTTP} Archive project a dataset provided by the \gls{HTTP} Archive project
\cite{soudersAnnouncingHTTPArchive2011}, \citeauthor{belloroKnowWhatYou2018} \cite{soudersAnnouncingHTTPArchive2011}, \citeauthor{belloroKnowWhatYou2018}
found that 1.34\% of the surveyed websites use Web SQL Database in one of their found that 1.34\% of the surveyed web sites use Web SQL Database in one of their
subresources. 53.59\% of Web SQL Database usage are considered to be coming from subresources. 53.59\% of Web SQL Database usage are considered to be coming from
known tracking domains. This ratio is lower for the first 10K web sites as known tracking domains. This ratio is lower for the first 10K web sites as
determined by Alexa (in May 2018): 2.12\% use Web SQL Database and 39.9\% of determined by Alexa (in May 2018): 2.12\% use Web SQL Database and 39.9\% of
@ -641,20 +641,19 @@ A variety of caches exist and they are utilized for different purposes, leading
to different forms of information exploitability for tracking users. This to different forms of information exploitability for tracking users. This
section introduces methods which are in most cases not prevalent but are more section introduces methods which are in most cases not prevalent but are more
sophisticated and can thus be much harder to circumvent or block. sophisticated and can thus be much harder to circumvent or block.
\todo{Insert structure}
\subsection{Web Cache} \subsection{Web Cache}
\label{subsec:web cache} \label{subsec:web cache}
Using the \gls{DOM} \gls{API}'s \texttt{Window.getComputedStyle()} method, Using the \gls{DOM} \gls{API}'s \texttt{Window.getComputedStyle()} method,
websites were able to check a user's browsing history by utilizing the \gls{CSS} web sites were able to check a user's browsing history by utilizing the \gls{CSS}
\texttt{:visited} selector. Links can be coloured depending on whether they have \texttt{:visited} selector. Links can be coloured depending on whether they have
already been visited or not. The colours can be set by the website trying to already been visited or not. The colours can be set by the web site trying to
find out what the user's browsing history is. JavaScript would then be used to find out what the user's browsing history is. JavaScript would then be used to
generate links on the fly for websites that will be cross-checked with the generate links on the fly for web sites that will be cross-checked with the
contents of the browsing history. After generating links, a script can check the contents of the browsing history. After generating links, a script can check the
colour, compare it with the colour that has been set for visited and non-visited colour, compare it with the colour that has been set for visited and non-visited
websites and see if a website has already been visited or not. web sites and see if a web site has already been visited or not.
A solution to the problem has been proposed and subsequently implemented by A solution to the problem has been proposed and subsequently implemented by
\citeauthor{baronPreventingAttacksUser2010} \citeauthor{baronPreventingAttacksUser2010}
@ -680,14 +679,14 @@ attributed to a single user but to a group as a whole can be used to more
accurately identify members of said group. accurately identify members of said group.
Other ways of utilizing a web browser's cache to track users are tracking Other ways of utilizing a web browser's cache to track users are tracking
whether a website asset (e.g., an image or script) has already been cached by whether a web site asset (e.g., an image or script) has already been cached by
the user agent or not. If it has been cached, the website knows that is has been the user agent or not. If it has been cached, the web site knows that is has been
visited before and if it has not been cached (the asset is downloaded from the visited before and if it has not been cached (the asset is downloaded from the
server), the user agent visits for the first time. Another way is to embed server), the user agent visits for the first time. Another way is to embed
identifiers in cached documents. An \gls{HTML} file can contain an identifier identifiers in cached documents. An \gls{HTML} file can contain an identifier
which is stored in a \texttt{<div>} tag and is cached by the user agent. The which is stored in a \texttt{<div>} tag and is cached by the user agent. The
identifier can then be read from the cache on subsequent visits, even from third identifier can then be read from the cache on subsequent visits, even from third
party websites. party web sites.
\subsection{Cache Timing} \subsection{Cache Timing}
\label{subsec:cache timing} \label{subsec:cache timing}
@ -699,18 +698,18 @@ cryptography to indirectly observe the generation or usage of a cipher key by
measuring cpu noises, frequencies, power usage or other properties that allow measuring cpu noises, frequencies, power usage or other properties that allow
conclusions to be drawn about the key. This type of attack is referred to as a conclusions to be drawn about the key. This type of attack is referred to as a
side-channel attack. Cache timing exploits the fact that it takes time to load side-channel attack. Cache timing exploits the fact that it takes time to load
assets for a website. It works by measuring the time a client takes to access a assets for a web site. It works by measuring the time a client takes to access a
specified resource. If the time is short, the resource has most likely been specified resource. If the time is short, the resource has most likely been
served from the cache and has thus been downloaded before, implying a visit to a served from the cache and has thus been downloaded before, implying a visit to a
website which uses that resource. If it takes longer than a cache hit would, on web site which uses that resource. If it takes longer than a cache hit would, on
the other hand, the resource did not exist before and has to be downloaded now, the other hand, the resource did not exist before and has to be downloaded now,
suggesting that no other website using that resource has been visited before. In suggesting that no other web site using that resource has been visited before. In
practice an attack might look like this (taken from practice an attack might look like this (taken from
\cite[p.~2]{feltenTimingAttacksWeb2000}): \cite[p.~2]{feltenTimingAttacksWeb2000}):
\begin{enumerate} \begin{enumerate}
\item Alice visits a website from Bob called \texttt{bob.com}. \item Alice visits a web site from Bob called \texttt{bob.com}.
\item Bob wants to find out whether Alice visited Charlie's website \item Bob wants to find out whether Alice visited Charlie's web site
\texttt{charlie.com} in the past. \texttt{charlie.com} in the past.
\item Bob chooses a file from \texttt{charlie.com} which is regularly \item Bob chooses a file from \texttt{charlie.com} which is regularly
downloaded by visitors to that site. downloaded by visitors to that site.
@ -725,7 +724,7 @@ practice an attack might look like this (taken from
\end{enumerate} \end{enumerate}
Bob can do this process for multiple resources and for every user that visits Bob can do this process for multiple resources and for every user that visits
his website, collecting browser history information on all of them. Since his web site, collecting browser history information on all of them. Since
caches exist to boost performance and avoid unnecessary loading of content from caches exist to boost performance and avoid unnecessary loading of content from
servers which has already been downloaded before, timing attacks are very hard servers which has already been downloaded before, timing attacks are very hard
to circumvent because caches exist solely for that purpose. Countermeasures to circumvent because caches exist solely for that purpose. Countermeasures
@ -741,13 +740,13 @@ miss performance and turning off Java and JavaScript but concluded that they
were unattractive or at worst ineffective. They propose a partial remedy for were unattractive or at worst ineffective. They propose a partial remedy for
cache timing by introducing \emph{Domain Tagging} which requires that resources cache timing by introducing \emph{Domain Tagging} which requires that resources
are tagged with the domain they have initially been loaded from. Once another are tagged with the domain they have initially been loaded from. Once another
website wants to determine whether a user has visited a site before by web site wants to determine whether a user has visited a site before by
cross-loading a resource, the domain does not match the tagged domain on the cross-loading a resource, the domain does not match the tagged domain on the
resource. If that is the case, the initial cache hit gets transformed into a resource. If that is the case, the initial cache hit gets transformed into a
cache miss and the resource has to be downloaded again, fooling the attacker cache miss and the resource has to be downloaded again, fooling the attacker
into believing that the origin website has not been visited before. It is into believing that the origin web site has not been visited before. It is
necessary to mention that at the time (2000) \glspl{CDN} were not as widely necessary to mention that at the time (2000) \glspl{CDN} were not as widely
used as today. Since websites rely on \glspl{CDN} to cache resources that are used as today. Since web sites rely on \glspl{CDN} to cache resources that are
used on multiple sites and can thus be served much faster from cache, domain used on multiple sites and can thus be served much faster from cache, domain
tagging would effectively nullify the performance boost a \gls{CDN} provides by tagging would effectively nullify the performance boost a \gls{CDN} provides by
converting every cache hit into a cache miss. The authors themselves question converting every cache hit into a cache miss. The authors themselves question
@ -768,10 +767,10 @@ discussed so far has not tackled the problem through a quantitative perspective
but instead focused on individual cases. Due to this missing piece, but instead focused on individual cases. Due to this missing piece,
\citeauthor{sanchez-rolaBakingTimerPrivacyAnalysis2019} \citeauthor{sanchez-rolaBakingTimerPrivacyAnalysis2019}
\cite{sanchez-rolaBakingTimerPrivacyAnalysis2019} conducted a survey on 10K \cite{sanchez-rolaBakingTimerPrivacyAnalysis2019} conducted a survey on 10K
websites to determine how feasible it is to perform a history sniffing attack on web sites to determine how feasible it is to perform a history sniffing attack on
a large scale. Their tool \textsc{BakingTimer} collects timing information on a large scale. Their tool \textsc{BakingTimer} collects timing information on
\gls{HTTP} requests, checking for logged in status and sensitive data. Their \gls{HTTP} requests, checking for logged in status and sensitive data. Their
results show that 71.07\% of the surveyed websites are vulnerable to the results show that 71.07\% of the surveyed web sites are vulnerable to the
attack. attack.
\subsection{Cache Control Directives} \subsection{Cache Control Directives}
@ -803,7 +802,7 @@ identifier has been placed in the \gls{ETag} header, the server can answer
requests to check for an updated resource always with an \gls{HTTP} 301 requests to check for an updated resource always with an \gls{HTTP} 301
Not-Modified header, effectively persisting the unique identifier in the Not-Modified header, effectively persisting the unique identifier in the
client's cache. During their 2011 survey of QuantCast.com's top 100 U.S. based client's cache. During their 2011 survey of QuantCast.com's top 100 U.S. based
websites \citeauthor{ayensonFlashCookiesPrivacy2011} web sites \citeauthor{ayensonFlashCookiesPrivacy2011}
\cite{ayensonFlashCookiesPrivacy2011} found \texttt{hulu.com} to be using \cite{ayensonFlashCookiesPrivacy2011} found \texttt{hulu.com} to be using
\glspl{ETag} as backup for tracking cookies that are set by \texttt{KISSmetrics} \glspl{ETag} as backup for tracking cookies that are set by \texttt{KISSmetrics}
(an analytics platform). This allowed cookies to be respawned once they had been (an analytics platform). This allowed cookies to be respawned once they had been
@ -830,11 +829,11 @@ own cache (e.g., browsers).
\citeauthor{kleinDNSCacheBasedUser2019} \cite{kleinDNSCacheBasedUser2019} \citeauthor{kleinDNSCacheBasedUser2019} \cite{kleinDNSCacheBasedUser2019}
demonstrated a tracking method which is using \gls{DNS} caches to assign unique demonstrated a tracking method which is using \gls{DNS} caches to assign unique
identifiers to client machines. In order for the technique to work, the tracker identifiers to client machines. In order for the technique to work, the tracker
has to have control over a web server as well as an authoritative \gls{DNS} has to have control over one web server (or multiple) as well as an
server which associates the web servers with a domain name under the control of authoritative \gls{DNS} server which associates the web servers with a domain
the tracker. The tracking process starts once a user agent requests a web site name under the control of the tracker. The tracking process starts once a user
which loads a script from one of the web servers the attacker is controlling. agent requests a web site which loads a script from one of the web servers the
The process can then be sketched out as follows (see attacker is controlling. The process can then be sketched out as follows (see
\cite[p.~5]{kleinDNSCacheBasedUser2019} for a detailed description). \cite[p.~5]{kleinDNSCacheBasedUser2019} for a detailed description).
\begin{enumerate} \begin{enumerate}

BIN
chapters/titlepage.pdf Normal file

Binary file not shown.

View File

@ -19,6 +19,7 @@
\usepackage{xr} \usepackage{xr}
\usepackage[acronym]{glossaries} \usepackage[acronym]{glossaries}
\usepackage{lastpage} \usepackage{lastpage}
\usepackage{pdfpages}
\glsenablehyper \glsenablehyper
@ -87,8 +88,15 @@
\input{abbrev/acronym.tex} \input{abbrev/acronym.tex}
\includepdf[pages=-]{chapters/titlepage.pdf}
\newpage
\pagenumbering{roman} \pagenumbering{roman}
\subfile{chapters/erklaerung.tex}
\thispagestyle{frontmatter}
\subfile{chapters/abstract-de} \subfile{chapters/abstract-de}
\thispagestyle{frontmatter} \thispagestyle{frontmatter}
@ -104,7 +112,8 @@
\listoflistings \listoflistings
\thispagestyle{frontmatter} \thispagestyle{frontmatter}
\printglossaries \printglossary
\printglossary[type=\acronymtype]
\thispagestyle{frontmatter} \thispagestyle{frontmatter}
\subfile{chapters/introduction} \subfile{chapters/introduction}