Review text so far, add titlepage and erklaerung
This commit is contained in:
parent
50e9bdac73
commit
3eb33ae783
6
.gitignore
vendored
6
.gitignore
vendored
@ -18,3 +18,9 @@ main.pdf
|
|||||||
main.run.xml
|
main.run.xml
|
||||||
main.synctex.gz
|
main.synctex.gz
|
||||||
main.toc
|
main.toc
|
||||||
|
main.acr
|
||||||
|
main.alg
|
||||||
|
main.glg
|
||||||
|
main.gls
|
||||||
|
main.ilg
|
||||||
|
main.ind
|
||||||
|
|||||||
29
chapters/erklaerung.tex
Normal file
29
chapters/erklaerung.tex
Normal file
@ -0,0 +1,29 @@
|
|||||||
|
\documentclass[../main.tex]{subfiles}
|
||||||
|
|
||||||
|
\begin{document}
|
||||||
|
|
||||||
|
\chapter*{Erklärung zur Verfassung der Arbeit}
|
||||||
|
|
||||||
|
\textsf{Tobias Eidelpes} \\
|
||||||
|
|
||||||
|
Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich
|
||||||
|
die verwendeten Quellen und Hilfsmittel vollständig angegeben habe und dass
|
||||||
|
ich die Stellen der Arbeit---einschließlich Tabellen, Karten und Abbildungen---,
|
||||||
|
die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen
|
||||||
|
sind, auf jeden Fall unter Angabe der Quelle als Entlehnung kenntlich gemacht habe.
|
||||||
|
|
||||||
|
\vspace{2cm}
|
||||||
|
|
||||||
|
\bigskip
|
||||||
|
|
||||||
|
\begin{minipage}{0.55\textwidth}
|
||||||
|
\textsf{Wien, 31. März 2020} \\
|
||||||
|
\end{minipage}
|
||||||
|
\begin{minipage}{0.45\textwidth}
|
||||||
|
\begin{tabular}{c}
|
||||||
|
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ \\
|
||||||
|
\textsf{Tobias Eidelpes}
|
||||||
|
\end{tabular}
|
||||||
|
\end{minipage}
|
||||||
|
|
||||||
|
\end{document}
|
||||||
@ -27,7 +27,7 @@ identifiers.
|
|||||||
\section{Session-based Tracking Methods}
|
\section{Session-based Tracking Methods}
|
||||||
\label{sec:session-based tracking methods}
|
\label{sec:session-based tracking methods}
|
||||||
|
|
||||||
One of the simplest and most used forms of tracking on the Internet rely on
|
One of the simplest and most used forms of tracking on the Internet relies on
|
||||||
sessions. Since HTTP is a stateless protocol, web servers cannot by default keep
|
sessions. Since HTTP is a stateless protocol, web servers cannot by default keep
|
||||||
track of any previous client requests. In order to implement specific features
|
track of any previous client requests. In order to implement specific features
|
||||||
such as personalized advertising, some means to save current and recall previous
|
such as personalized advertising, some means to save current and recall previous
|
||||||
@ -94,12 +94,12 @@ web \cite{westMeasuringPrivacyDisclosures2014}.
|
|||||||
\subsection{Hidden Form Fields}
|
\subsection{Hidden Form Fields}
|
||||||
\label{subsec:hidden form fields}
|
\label{subsec:hidden form fields}
|
||||||
|
|
||||||
The \gls{HTML} provides a specification for form elements, which allow users to
|
The \gls{HTML} provides a specification for form elements, which allows users to
|
||||||
submit information (e.g., for authentication) to the server via POST or GET
|
submit information (e.g., for authentication) to the server via POST or GET
|
||||||
methods. Normally, a user would input data into a form and on clicking
|
methods. Normally, a user would input data into a form and on clicking
|
||||||
\emph{submit} the input would be sent to the server. Sometimes it is necessary
|
\emph{submit} the input would be sent to the server. Sometimes it is necessary
|
||||||
to include additional information that the user did not enter. For this reason
|
to include additional information that the user did not enter. For this reason
|
||||||
there exist \emph{hidden} web forms. Hidden web forms do not show on the website
|
there exist \emph{hidden} web forms. Hidden web forms do not show on the web site
|
||||||
and therefore the user cannot enter any information. Similar to \gls{URL}
|
and therefore the user cannot enter any information. Similar to \gls{URL}
|
||||||
parameters, the value parameter in a hidden field contains additional
|
parameters, the value parameter in a hidden field contains additional
|
||||||
information like the user's preferred language for example. Since almost
|
information like the user's preferred language for example. Since almost
|
||||||
@ -126,7 +126,7 @@ is sent to the server along with the data the user has filled in.
|
|||||||
\subsection{HTTP Referer}
|
\subsection{HTTP Referer}
|
||||||
\label{subsec:http referer}
|
\label{subsec:http referer}
|
||||||
|
|
||||||
Providers of web services often want to know where visitors to their website
|
Providers of web services often want to know where visitors to their web site
|
||||||
come from to understand more about their users and their browsing habits. The
|
come from to understand more about their users and their browsing habits. The
|
||||||
\gls{HTTP} specification accounts for this by introducing the \emph{\gls{HTTP}
|
\gls{HTTP} specification accounts for this by introducing the \emph{\gls{HTTP}
|
||||||
Referer field} [\emph{sic}] in the header. By checking the referrer, the server
|
Referer field} [\emph{sic}] in the header. By checking the referrer, the server
|
||||||
@ -147,7 +147,7 @@ identifiability of users on the web.
|
|||||||
\label{subsec:explicit authentication}
|
\label{subsec:explicit authentication}
|
||||||
|
|
||||||
Explicit authentication requires a user to \emph{explicitly} log in or register
|
Explicit authentication requires a user to \emph{explicitly} log in or register
|
||||||
to the website. This way, specific resources are only available to the user when
|
to the web site. This way, specific resources are only available to the user when
|
||||||
he or she has authenticated themselves to the service. Actions taken on an
|
he or she has authenticated themselves to the service. Actions taken on an
|
||||||
authenticated user account are tied to that account and crafting a personal
|
authenticated user account are tied to that account and crafting a personal
|
||||||
profile is more or less a built-in function in this case. Since merely asking a
|
profile is more or less a built-in function in this case. Since merely asking a
|
||||||
@ -167,12 +167,12 @@ efforts are not detected by the average user \cite{}, it is known that actions
|
|||||||
taken on an account are logged to provide better service through service
|
taken on an account are logged to provide better service through service
|
||||||
optimization and profile personalization.
|
optimization and profile personalization.
|
||||||
|
|
||||||
Making an account on a website to use their services to their full extent, can
|
Making an account on a web site to use their services to their full extent, can
|
||||||
be beneficial in some cases. Facebook for example, allows their users to
|
be beneficial in some cases. Facebook for example, allows their users to
|
||||||
configure what they want to share with the public and their friends. Research
|
configure what they want to share with the public and their friends. Research
|
||||||
has shown however, that managing which posts get shown to whom is not as
|
has shown however, that managing which posts get shown to whom is not as
|
||||||
straightforward as one might think.
|
straightforward as one might think.
|
||||||
\todo{Wrong chapter?} \citeauthor{liuAnalyzingFacebookPrivacy2011}
|
\citeauthor{liuAnalyzingFacebookPrivacy2011}
|
||||||
\cite{liuAnalyzingFacebookPrivacy2011} conducted a survey where they asked
|
\cite{liuAnalyzingFacebookPrivacy2011} conducted a survey where they asked
|
||||||
Facebook users about their desired privacy and visibility settings and
|
Facebook users about their desired privacy and visibility settings and
|
||||||
cross-checked them with the actual settings they have used for their posts. The
|
cross-checked them with the actual settings they have used for their posts. The
|
||||||
@ -236,11 +236,11 @@ A method which is most often associated with tracking on the Internet is
|
|||||||
tracking with \gls{HTTP} cookies. Cookies are small files that are placed in the
|
tracking with \gls{HTTP} cookies. Cookies are small files that are placed in the
|
||||||
browser's storage on the user's computer. They are limited to four kilobytes in
|
browser's storage on the user's computer. They are limited to four kilobytes in
|
||||||
size and are generally used to identify and authenticate users and to store
|
size and are generally used to identify and authenticate users and to store
|
||||||
website preferences. They were introduced to the web to allow stateful
|
web site preferences. They were introduced to the web to allow stateful
|
||||||
information to be stored because the \gls{HTTP} is a stateless protocol and
|
information to be stored because the \gls{HTTP} is a stateless protocol and
|
||||||
therefore does not have this capability. It is also a way of reducing the
|
therefore does not have this capability. It is also a way of reducing the
|
||||||
server's load by not having to recompute states every time a user visits a
|
server's load by not having to recompute states every time a user visits a
|
||||||
website. Shopping cart functionality for example can thus be implemented by
|
web site. Shopping cart functionality for example can thus be implemented by
|
||||||
setting a cookie in the user's browser, saving the items which are currently
|
setting a cookie in the user's browser, saving the items which are currently
|
||||||
added to the shopping cart and giving the user the possibility to resume
|
added to the shopping cart and giving the user the possibility to resume
|
||||||
shopping at a later point provided that they do not delete their cookies. With
|
shopping at a later point provided that they do not delete their cookies. With
|
||||||
@ -279,7 +279,7 @@ soon as the session is `torn down'. By adding an expiration date (demonstrated
|
|||||||
in Listing~\ref{lst:permanent cookie header}) or a maximum age, the cookie
|
in Listing~\ref{lst:permanent cookie header}) or a maximum age, the cookie
|
||||||
becomes permanent. Additionally, the domain attribute can be specified, meaning
|
becomes permanent. Additionally, the domain attribute can be specified, meaning
|
||||||
that cookies which list a different domain than the origin, are rejected by the
|
that cookies which list a different domain than the origin, are rejected by the
|
||||||
user agent \cite[Section 4.1.2.3]{barthHTTPStateManagement2011}. The same-origin
|
user agent \cite[section 4.1.2.3]{barthHTTPStateManagement2011}. The same-origin
|
||||||
policy applies to cookies, disallowing access by other domains.
|
policy applies to cookies, disallowing access by other domains.
|
||||||
|
|
||||||
\begin{listing}
|
\begin{listing}
|
||||||
@ -308,7 +308,7 @@ Additionally, a length of more than 35 characters in the value field applies to
|
|||||||
80\% of non-tracking cookies. \emph{Cookie Chunking}, where a cookie of larger
|
80\% of non-tracking cookies. \emph{Cookie Chunking}, where a cookie of larger
|
||||||
length is split into multiple cookies with smaller length, did not appear to
|
length is split into multiple cookies with smaller length, did not appear to
|
||||||
affect detection by their method negatively. They also present a site
|
affect detection by their method negatively. They also present a site
|
||||||
measurement of the Alexa Top 10,000 websites, finding that 46\% of websites use
|
measurement of the Alexa Top 10,000 web sites, finding that 46\% of web sites use
|
||||||
third party tracking. More recent research
|
third party tracking. More recent research
|
||||||
\cite{gonzalezCookieRecipeUntangling2017} has shown that tracking cookies do not
|
\cite{gonzalezCookieRecipeUntangling2017} has shown that tracking cookies do not
|
||||||
have to be long lasting to accumulate data about users. Some cookies---like the
|
have to be long lasting to accumulate data about users. Some cookies---like the
|
||||||
@ -332,7 +332,7 @@ detect and block cookies (see chapter~\ref{chap:defences against tracking}).
|
|||||||
\label{subsec:flash cookies and java jnlp persistenceservice}
|
\label{subsec:flash cookies and java jnlp persistenceservice}
|
||||||
|
|
||||||
Flash Cookies are similar to HTTP cookies in that they too are a store of
|
Flash Cookies are similar to HTTP cookies in that they too are a store of
|
||||||
information that helps websites and servers to recognize already seen users.
|
information that helps web sites and servers to recognize already seen users.
|
||||||
They are referred to as \glspl{LSO} by Adobe and are part of the Adobe Flash
|
They are referred to as \glspl{LSO} by Adobe and are part of the Adobe Flash
|
||||||
Player runtime. Instead of storing data in the browser's storage, they have
|
Player runtime. Instead of storing data in the browser's storage, they have
|
||||||
their own storage in a different location on the user's computer. Another
|
their own storage in a different location on the user's computer. Another
|
||||||
@ -352,11 +352,11 @@ posed by \gls{HTTP} cookies and reacted by taking countermeasures.
|
|||||||
|
|
||||||
\citeauthor{soltaniFlashCookiesPrivacy2009}
|
\citeauthor{soltaniFlashCookiesPrivacy2009}
|
||||||
\cite{soltaniFlashCookiesPrivacy2009} were the first to report on the usage of
|
\cite{soltaniFlashCookiesPrivacy2009} were the first to report on the usage of
|
||||||
Flash cookies by advertisers and popular websites. While surveying the top 100
|
Flash cookies by advertisers and popular web sites. While surveying the top 100
|
||||||
websites at the time, they found that 54\% of them used Flash cookies. Some
|
web sites at the time, they found that 54\% of them used Flash cookies. Some
|
||||||
websites were setting Flash cookies as well as \gls{HTTP} cookies with the same
|
web sites were setting Flash cookies as well as \gls{HTTP} cookies with the same
|
||||||
values, suggesting that Flash cookies serve as backup to \gls{HTTP} cookies.
|
values, suggesting that Flash cookies serve as backup to \gls{HTTP} cookies.
|
||||||
Several websites were found using Flash cookies to respawn already deleted
|
Several web sites were found using Flash cookies to respawn already deleted
|
||||||
\gls{HTTP} cookies, even across domains. \citeauthor{acarWebNeverForgets2014}
|
\gls{HTTP} cookies, even across domains. \citeauthor{acarWebNeverForgets2014}
|
||||||
\cite{acarWebNeverForgets2014} automated detecting Flash cookies and access to
|
\cite{acarWebNeverForgets2014} automated detecting Flash cookies and access to
|
||||||
them by monitoring file access with the GNU/Linux \emph{strace} tool
|
them by monitoring file access with the GNU/Linux \emph{strace} tool
|
||||||
@ -367,10 +367,10 @@ top 100 sites use Flash cookies for respawning.
|
|||||||
Even though Flash usage has declined during the last few years thanks to the
|
Even though Flash usage has declined during the last few years thanks to the
|
||||||
development of the HTML5 standard, \citeauthor{buhovFLASH20thCentury2018}
|
development of the HTML5 standard, \citeauthor{buhovFLASH20thCentury2018}
|
||||||
\cite{buhovFLASH20thCentury2018} have shown that despite major security flaws,
|
\cite{buhovFLASH20thCentury2018} have shown that despite major security flaws,
|
||||||
Flash content is still served by 7.5\% of the top one million websites (2017).
|
Flash content is still served by 7.5\% of the top one million web sites (2017).
|
||||||
The W3Techs Web Technology Survey shows a similar trend and also offers an
|
The W3Techs Web Technology Survey shows a similar trend and also offers an
|
||||||
up-to-date measurement of 2.7\% of the top ten million websites for the year
|
up-to-date measurement of 2.7\% of the top ten million web sites for the year
|
||||||
2020 \cite{w3techsHistoricalYearlyTrends2020}. Due to the security concerns in
|
2020 \cite{w3techsHistoricalYearlyTrends2020}. Due to the security concerns with
|
||||||
using Flash, Google's popular video sharing platform YouTube switched by default
|
using Flash, Google's popular video sharing platform YouTube switched by default
|
||||||
to the HTML5 <video> tag in January of 2015
|
to the HTML5 <video> tag in January of 2015
|
||||||
\cite{youtubeengineeringYouTubeNowDefaults2015}. In 2017 Adobe announced that they
|
\cite{youtubeengineeringYouTubeNowDefaults2015}. In 2017 Adobe announced that they
|
||||||
@ -389,13 +389,13 @@ injecting a Java applet into the \gls{DOM} of a page
|
|||||||
\subsection{Evercookie}
|
\subsection{Evercookie}
|
||||||
\label{subsec:evercookie}
|
\label{subsec:evercookie}
|
||||||
|
|
||||||
Evercookie is JavaScript code that can be embedded in websites which allows to
|
Evercookie is JavaScript code that can be embedded in web sites which allows to
|
||||||
permanently store information on the user's computer. When activated,
|
permanently store information on the user's computer. When activated,
|
||||||
information is not only stored in standard \gls{HTTP} cookies but also in
|
information is not only stored in standard \gls{HTTP} cookies but also in
|
||||||
various other places, providing redundancy where possible. A full list of
|
various other places, providing redundancy where possible. A full list of
|
||||||
locations used by Evercookie can be found on the project's github page
|
locations used by Evercookie can be found on the project's github page
|
||||||
\cite{kamkarSamykEvercookie2020}. In case the user wants to get rid of all
|
\cite{kamkarSamykEvercookie2020}. In case the user wants to get rid of all
|
||||||
information stored by visiting a website that uses evercookies, every location
|
information stored by visiting a web site that uses evercookies, every location
|
||||||
has to be cleared because if one remains, all the other cookies are restored.
|
has to be cleared because if one remains, all the other cookies are restored.
|
||||||
The cookie deletion mechanisms that are provided by browsers by default do not
|
The cookie deletion mechanisms that are provided by browsers by default do not
|
||||||
clear all locations where evercookies are stored, which makes evercookie almost
|
clear all locations where evercookies are stored, which makes evercookie almost
|
||||||
@ -422,7 +422,7 @@ ways to accurately match an accumulated profile history of one identifier to
|
|||||||
another. This problem has been solved by modern trackers by using a mechanism
|
another. This problem has been solved by modern trackers by using a mechanism
|
||||||
called Cookie Synchronization or Cookie Matching. This technique allows multiple
|
called Cookie Synchronization or Cookie Matching. This technique allows multiple
|
||||||
trackers to open an information sharing channel between each other without
|
trackers to open an information sharing channel between each other without
|
||||||
necessarily having to know the website the user visits.
|
necessarily having to know the web site the user visits.
|
||||||
|
|
||||||
\begin{figure}[ht]
|
\begin{figure}[ht]
|
||||||
\centering
|
\centering
|
||||||
@ -436,14 +436,14 @@ An example of how Cookie Synchronization works in practice is given in
|
|||||||
Figure~\ref{fig:cookie synchronization}. The two parties that are interested in
|
Figure~\ref{fig:cookie synchronization}. The two parties that are interested in
|
||||||
tracking the user are called \emph{cloudflare.com} and \emph{google.com} in this
|
tracking the user are called \emph{cloudflare.com} and \emph{google.com} in this
|
||||||
example. The user they want to track is called \emph{browser}. \emph{Browser}
|
example. The user they want to track is called \emph{browser}. \emph{Browser}
|
||||||
first visits \emph{website1.com} which loads JavaScript from
|
first visits \emph{web site1.com} which loads JavaScript from
|
||||||
\emph{cloudflare.com}. \emph{Cloudflare.com} sets a cookie in the browser with a
|
\emph{cloudflare.com}. \emph{Cloudflare.com} sets a cookie in the browser with a
|
||||||
tracking identifier called \emph{userID = 1234}. Next, \emph{browser} visits
|
tracking identifier called \emph{userID = 1234}. Next, \emph{browser} visits
|
||||||
another website called \emph{website2.com} which loads an advertisement banner
|
another web site called \emph{web site2.com} which loads an advertisement banner
|
||||||
from \emph{google.com}. \emph{Google.com} also sets a cookie with the tracking
|
from \emph{google.com}. \emph{Google.com} also sets a cookie with the tracking
|
||||||
identifier \emph{userID = ABCD}. \emph{Browser} has now two cookies from two
|
identifier \emph{userID = ABCD}. \emph{Browser} has now two cookies from two
|
||||||
different providers, each of them knowing the user under a different identifier.
|
different providers, each of them knowing the user under a different identifier.
|
||||||
When \emph{browser} visits a third website called \emph{website3.com} which
|
When \emph{browser} visits a third web site called \emph{website3.com} which
|
||||||
makes a request to \emph{cloudflare.com} and recognizes the user with the
|
makes a request to \emph{cloudflare.com} and recognizes the user with the
|
||||||
identifier \emph{userID = 1234}, \emph{cloudflare.com} sends an \gls{HTTP}
|
identifier \emph{userID = 1234}, \emph{cloudflare.com} sends an \gls{HTTP}
|
||||||
redirect, redirecting \emph{browser} to \emph{google.com}. The redirect also
|
redirect, redirecting \emph{browser} to \emph{google.com}. The redirect also
|
||||||
@ -481,7 +481,7 @@ top 1000 (46\%) use Cookie Synchronization with at least one other party.
|
|||||||
parties. \citeauthor{papadopoulosExclusiveHowSynced2018} show in
|
parties. \citeauthor{papadopoulosExclusiveHowSynced2018} show in
|
||||||
\cite{papadopoulosExclusiveHowSynced2018} the threat that Cookie Synchronization
|
\cite{papadopoulosExclusiveHowSynced2018} the threat that Cookie Synchronization
|
||||||
poses to encrypted \gls{TLS} sessions by performing the cookie-syncing over
|
poses to encrypted \gls{TLS} sessions by performing the cookie-syncing over
|
||||||
unencrypted \gls{HTTP} even though the original request to the website was
|
unencrypted \gls{HTTP} even though the original request to the web site was
|
||||||
encrypted. This highlights the serious privacy implications for users of
|
encrypted. This highlights the serious privacy implications for users of
|
||||||
\gls{VPN} services trying to safeguard their traffic from a potentially
|
\gls{VPN} services trying to safeguard their traffic from a potentially
|
||||||
malicious \gls{ISP}.
|
malicious \gls{ISP}.
|
||||||
@ -499,11 +499,11 @@ settings in the Silverlight application. Silverlight's Isolated Storage is one
|
|||||||
of the methods evercookie (section~\ref{subsec:evercookie}) uses to make
|
of the methods evercookie (section~\ref{subsec:evercookie}) uses to make
|
||||||
permanent deletion of cookies hard to do and to facilitate cookie respawning.
|
permanent deletion of cookies hard to do and to facilitate cookie respawning.
|
||||||
Usage of Silverlight has seen a steady decline since 2011 even though it has
|
Usage of Silverlight has seen a steady decline since 2011 even though it has
|
||||||
been used by popular video streaming websites such as Netflix
|
been used by popular video streaming web sites such as Netflix
|
||||||
\cite{NetflixBeginsRollOut2010} and Amazon. Microsoft did not include
|
\cite{NetflixBeginsRollOut2010} and Amazon. Microsoft did not include
|
||||||
Silverlight support in Windows 8 and declared end-of-life in a blog post for
|
Silverlight support in Windows 8 and declared end-of-life in a blog post for
|
||||||
October of 2021 \cite{SilverlightEndSupport2015}. Usage of Silverlight currently
|
October of 2021 \cite{SilverlightEndSupport2015}. Usage of Silverlight currently
|
||||||
hovers around 0.04\% for the top 10 million websites
|
hovers around 0.04\% for the top 10 million web sites
|
||||||
\cite{w3techsUsageStatisticsSilverlight2020}.
|
\cite{w3techsUsageStatisticsSilverlight2020}.
|
||||||
|
|
||||||
\subsection{HTML5 Web Storage}
|
\subsection{HTML5 Web Storage}
|
||||||
@ -529,7 +529,7 @@ applications. Due to it violating the same-origin policy, most major browsers
|
|||||||
have not implemented Global Storage.
|
have not implemented Global Storage.
|
||||||
|
|
||||||
Local Storage does, however, obey the same-origin policy by only allowing the
|
Local Storage does, however, obey the same-origin policy by only allowing the
|
||||||
originating domain access to its name-value pairs. Every website has their own
|
originating domain access to its name-value pairs. Every web site has their own
|
||||||
separate storage area which maintains a clear separation of concerns. Local
|
separate storage area which maintains a clear separation of concerns. Local
|
||||||
Storage lends itself for different use cases. Especially applications that
|
Storage lends itself for different use cases. Especially applications that
|
||||||
should function even when no internet connection exists can use Local Storage to
|
should function even when no internet connection exists can use Local Storage to
|
||||||
@ -558,7 +558,7 @@ tracking domains.
|
|||||||
\label{subsec:html5 indexed database api}
|
\label{subsec:html5 indexed database api}
|
||||||
|
|
||||||
The need for client side storage to provide performant web applications that can
|
The need for client side storage to provide performant web applications that can
|
||||||
also function offline, has prompted the inception of alternative methods to
|
also function offline has prompted the inception of alternative methods to
|
||||||
store and retrieve information. Consequently, the development of the HTML5
|
store and retrieve information. Consequently, the development of the HTML5
|
||||||
standard has tried to fill that need by introducing HTML5 Web Storage and the
|
standard has tried to fill that need by introducing HTML5 Web Storage and the
|
||||||
HTML5 Indexed Database \gls{API}.
|
HTML5 Indexed Database \gls{API}.
|
||||||
@ -618,7 +618,7 @@ section~\ref{subsec:evercookie}) to add another layer of redundancy for storing
|
|||||||
unique identifiers and respawning deleted ones. By performing static analysis on
|
unique identifiers and respawning deleted ones. By performing static analysis on
|
||||||
a dataset provided by the \gls{HTTP} Archive project
|
a dataset provided by the \gls{HTTP} Archive project
|
||||||
\cite{soudersAnnouncingHTTPArchive2011}, \citeauthor{belloroKnowWhatYou2018}
|
\cite{soudersAnnouncingHTTPArchive2011}, \citeauthor{belloroKnowWhatYou2018}
|
||||||
found that 1.34\% of the surveyed websites use Web SQL Database in one of their
|
found that 1.34\% of the surveyed web sites use Web SQL Database in one of their
|
||||||
subresources. 53.59\% of Web SQL Database usage are considered to be coming from
|
subresources. 53.59\% of Web SQL Database usage are considered to be coming from
|
||||||
known tracking domains. This ratio is lower for the first 10K web sites as
|
known tracking domains. This ratio is lower for the first 10K web sites as
|
||||||
determined by Alexa (in May 2018): 2.12\% use Web SQL Database and 39.9\% of
|
determined by Alexa (in May 2018): 2.12\% use Web SQL Database and 39.9\% of
|
||||||
@ -641,20 +641,19 @@ A variety of caches exist and they are utilized for different purposes, leading
|
|||||||
to different forms of information exploitability for tracking users. This
|
to different forms of information exploitability for tracking users. This
|
||||||
section introduces methods which are in most cases not prevalent but are more
|
section introduces methods which are in most cases not prevalent but are more
|
||||||
sophisticated and can thus be much harder to circumvent or block.
|
sophisticated and can thus be much harder to circumvent or block.
|
||||||
\todo{Insert structure}
|
|
||||||
|
|
||||||
\subsection{Web Cache}
|
\subsection{Web Cache}
|
||||||
\label{subsec:web cache}
|
\label{subsec:web cache}
|
||||||
|
|
||||||
Using the \gls{DOM} \gls{API}'s \texttt{Window.getComputedStyle()} method,
|
Using the \gls{DOM} \gls{API}'s \texttt{Window.getComputedStyle()} method,
|
||||||
websites were able to check a user's browsing history by utilizing the \gls{CSS}
|
web sites were able to check a user's browsing history by utilizing the \gls{CSS}
|
||||||
\texttt{:visited} selector. Links can be coloured depending on whether they have
|
\texttt{:visited} selector. Links can be coloured depending on whether they have
|
||||||
already been visited or not. The colours can be set by the website trying to
|
already been visited or not. The colours can be set by the web site trying to
|
||||||
find out what the user's browsing history is. JavaScript would then be used to
|
find out what the user's browsing history is. JavaScript would then be used to
|
||||||
generate links on the fly for websites that will be cross-checked with the
|
generate links on the fly for web sites that will be cross-checked with the
|
||||||
contents of the browsing history. After generating links, a script can check the
|
contents of the browsing history. After generating links, a script can check the
|
||||||
colour, compare it with the colour that has been set for visited and non-visited
|
colour, compare it with the colour that has been set for visited and non-visited
|
||||||
websites and see if a website has already been visited or not.
|
web sites and see if a web site has already been visited or not.
|
||||||
|
|
||||||
A solution to the problem has been proposed and subsequently implemented by
|
A solution to the problem has been proposed and subsequently implemented by
|
||||||
\citeauthor{baronPreventingAttacksUser2010}
|
\citeauthor{baronPreventingAttacksUser2010}
|
||||||
@ -680,14 +679,14 @@ attributed to a single user but to a group as a whole can be used to more
|
|||||||
accurately identify members of said group.
|
accurately identify members of said group.
|
||||||
|
|
||||||
Other ways of utilizing a web browser's cache to track users are tracking
|
Other ways of utilizing a web browser's cache to track users are tracking
|
||||||
whether a website asset (e.g., an image or script) has already been cached by
|
whether a web site asset (e.g., an image or script) has already been cached by
|
||||||
the user agent or not. If it has been cached, the website knows that is has been
|
the user agent or not. If it has been cached, the web site knows that is has been
|
||||||
visited before and if it has not been cached (the asset is downloaded from the
|
visited before and if it has not been cached (the asset is downloaded from the
|
||||||
server), the user agent visits for the first time. Another way is to embed
|
server), the user agent visits for the first time. Another way is to embed
|
||||||
identifiers in cached documents. An \gls{HTML} file can contain an identifier
|
identifiers in cached documents. An \gls{HTML} file can contain an identifier
|
||||||
which is stored in a \texttt{<div>} tag and is cached by the user agent. The
|
which is stored in a \texttt{<div>} tag and is cached by the user agent. The
|
||||||
identifier can then be read from the cache on subsequent visits, even from third
|
identifier can then be read from the cache on subsequent visits, even from third
|
||||||
party websites.
|
party web sites.
|
||||||
|
|
||||||
\subsection{Cache Timing}
|
\subsection{Cache Timing}
|
||||||
\label{subsec:cache timing}
|
\label{subsec:cache timing}
|
||||||
@ -699,18 +698,18 @@ cryptography to indirectly observe the generation or usage of a cipher key by
|
|||||||
measuring cpu noises, frequencies, power usage or other properties that allow
|
measuring cpu noises, frequencies, power usage or other properties that allow
|
||||||
conclusions to be drawn about the key. This type of attack is referred to as a
|
conclusions to be drawn about the key. This type of attack is referred to as a
|
||||||
side-channel attack. Cache timing exploits the fact that it takes time to load
|
side-channel attack. Cache timing exploits the fact that it takes time to load
|
||||||
assets for a website. It works by measuring the time a client takes to access a
|
assets for a web site. It works by measuring the time a client takes to access a
|
||||||
specified resource. If the time is short, the resource has most likely been
|
specified resource. If the time is short, the resource has most likely been
|
||||||
served from the cache and has thus been downloaded before, implying a visit to a
|
served from the cache and has thus been downloaded before, implying a visit to a
|
||||||
website which uses that resource. If it takes longer than a cache hit would, on
|
web site which uses that resource. If it takes longer than a cache hit would, on
|
||||||
the other hand, the resource did not exist before and has to be downloaded now,
|
the other hand, the resource did not exist before and has to be downloaded now,
|
||||||
suggesting that no other website using that resource has been visited before. In
|
suggesting that no other web site using that resource has been visited before. In
|
||||||
practice an attack might look like this (taken from
|
practice an attack might look like this (taken from
|
||||||
\cite[p.~2]{feltenTimingAttacksWeb2000}):
|
\cite[p.~2]{feltenTimingAttacksWeb2000}):
|
||||||
|
|
||||||
\begin{enumerate}
|
\begin{enumerate}
|
||||||
\item Alice visits a website from Bob called \texttt{bob.com}.
|
\item Alice visits a web site from Bob called \texttt{bob.com}.
|
||||||
\item Bob wants to find out whether Alice visited Charlie's website
|
\item Bob wants to find out whether Alice visited Charlie's web site
|
||||||
\texttt{charlie.com} in the past.
|
\texttt{charlie.com} in the past.
|
||||||
\item Bob chooses a file from \texttt{charlie.com} which is regularly
|
\item Bob chooses a file from \texttt{charlie.com} which is regularly
|
||||||
downloaded by visitors to that site.
|
downloaded by visitors to that site.
|
||||||
@ -725,7 +724,7 @@ practice an attack might look like this (taken from
|
|||||||
\end{enumerate}
|
\end{enumerate}
|
||||||
|
|
||||||
Bob can do this process for multiple resources and for every user that visits
|
Bob can do this process for multiple resources and for every user that visits
|
||||||
his website, collecting browser history information on all of them. Since
|
his web site, collecting browser history information on all of them. Since
|
||||||
caches exist to boost performance and avoid unnecessary loading of content from
|
caches exist to boost performance and avoid unnecessary loading of content from
|
||||||
servers which has already been downloaded before, timing attacks are very hard
|
servers which has already been downloaded before, timing attacks are very hard
|
||||||
to circumvent because caches exist solely for that purpose. Countermeasures
|
to circumvent because caches exist solely for that purpose. Countermeasures
|
||||||
@ -741,13 +740,13 @@ miss performance and turning off Java and JavaScript but concluded that they
|
|||||||
were unattractive or at worst ineffective. They propose a partial remedy for
|
were unattractive or at worst ineffective. They propose a partial remedy for
|
||||||
cache timing by introducing \emph{Domain Tagging} which requires that resources
|
cache timing by introducing \emph{Domain Tagging} which requires that resources
|
||||||
are tagged with the domain they have initially been loaded from. Once another
|
are tagged with the domain they have initially been loaded from. Once another
|
||||||
website wants to determine whether a user has visited a site before by
|
web site wants to determine whether a user has visited a site before by
|
||||||
cross-loading a resource, the domain does not match the tagged domain on the
|
cross-loading a resource, the domain does not match the tagged domain on the
|
||||||
resource. If that is the case, the initial cache hit gets transformed into a
|
resource. If that is the case, the initial cache hit gets transformed into a
|
||||||
cache miss and the resource has to be downloaded again, fooling the attacker
|
cache miss and the resource has to be downloaded again, fooling the attacker
|
||||||
into believing that the origin website has not been visited before. It is
|
into believing that the origin web site has not been visited before. It is
|
||||||
necessary to mention that at the time (2000) \glspl{CDN} were not as widely
|
necessary to mention that at the time (2000) \glspl{CDN} were not as widely
|
||||||
used as today. Since websites rely on \glspl{CDN} to cache resources that are
|
used as today. Since web sites rely on \glspl{CDN} to cache resources that are
|
||||||
used on multiple sites and can thus be served much faster from cache, domain
|
used on multiple sites and can thus be served much faster from cache, domain
|
||||||
tagging would effectively nullify the performance boost a \gls{CDN} provides by
|
tagging would effectively nullify the performance boost a \gls{CDN} provides by
|
||||||
converting every cache hit into a cache miss. The authors themselves question
|
converting every cache hit into a cache miss. The authors themselves question
|
||||||
@ -768,10 +767,10 @@ discussed so far has not tackled the problem through a quantitative perspective
|
|||||||
but instead focused on individual cases. Due to this missing piece,
|
but instead focused on individual cases. Due to this missing piece,
|
||||||
\citeauthor{sanchez-rolaBakingTimerPrivacyAnalysis2019}
|
\citeauthor{sanchez-rolaBakingTimerPrivacyAnalysis2019}
|
||||||
\cite{sanchez-rolaBakingTimerPrivacyAnalysis2019} conducted a survey on 10K
|
\cite{sanchez-rolaBakingTimerPrivacyAnalysis2019} conducted a survey on 10K
|
||||||
websites to determine how feasible it is to perform a history sniffing attack on
|
web sites to determine how feasible it is to perform a history sniffing attack on
|
||||||
a large scale. Their tool \textsc{BakingTimer} collects timing information on
|
a large scale. Their tool \textsc{BakingTimer} collects timing information on
|
||||||
\gls{HTTP} requests, checking for logged in status and sensitive data. Their
|
\gls{HTTP} requests, checking for logged in status and sensitive data. Their
|
||||||
results show that 71.07\% of the surveyed websites are vulnerable to the
|
results show that 71.07\% of the surveyed web sites are vulnerable to the
|
||||||
attack.
|
attack.
|
||||||
|
|
||||||
\subsection{Cache Control Directives}
|
\subsection{Cache Control Directives}
|
||||||
@ -803,7 +802,7 @@ identifier has been placed in the \gls{ETag} header, the server can answer
|
|||||||
requests to check for an updated resource always with an \gls{HTTP} 301
|
requests to check for an updated resource always with an \gls{HTTP} 301
|
||||||
Not-Modified header, effectively persisting the unique identifier in the
|
Not-Modified header, effectively persisting the unique identifier in the
|
||||||
client's cache. During their 2011 survey of QuantCast.com's top 100 U.S. based
|
client's cache. During their 2011 survey of QuantCast.com's top 100 U.S. based
|
||||||
websites \citeauthor{ayensonFlashCookiesPrivacy2011}
|
web sites \citeauthor{ayensonFlashCookiesPrivacy2011}
|
||||||
\cite{ayensonFlashCookiesPrivacy2011} found \texttt{hulu.com} to be using
|
\cite{ayensonFlashCookiesPrivacy2011} found \texttt{hulu.com} to be using
|
||||||
\glspl{ETag} as backup for tracking cookies that are set by \texttt{KISSmetrics}
|
\glspl{ETag} as backup for tracking cookies that are set by \texttt{KISSmetrics}
|
||||||
(an analytics platform). This allowed cookies to be respawned once they had been
|
(an analytics platform). This allowed cookies to be respawned once they had been
|
||||||
@ -830,11 +829,11 @@ own cache (e.g., browsers).
|
|||||||
\citeauthor{kleinDNSCacheBasedUser2019} \cite{kleinDNSCacheBasedUser2019}
|
\citeauthor{kleinDNSCacheBasedUser2019} \cite{kleinDNSCacheBasedUser2019}
|
||||||
demonstrated a tracking method which is using \gls{DNS} caches to assign unique
|
demonstrated a tracking method which is using \gls{DNS} caches to assign unique
|
||||||
identifiers to client machines. In order for the technique to work, the tracker
|
identifiers to client machines. In order for the technique to work, the tracker
|
||||||
has to have control over a web server as well as an authoritative \gls{DNS}
|
has to have control over one web server (or multiple) as well as an
|
||||||
server which associates the web servers with a domain name under the control of
|
authoritative \gls{DNS} server which associates the web servers with a domain
|
||||||
the tracker. The tracking process starts once a user agent requests a web site
|
name under the control of the tracker. The tracking process starts once a user
|
||||||
which loads a script from one of the web servers the attacker is controlling.
|
agent requests a web site which loads a script from one of the web servers the
|
||||||
The process can then be sketched out as follows (see
|
attacker is controlling. The process can then be sketched out as follows (see
|
||||||
\cite[p.~5]{kleinDNSCacheBasedUser2019} for a detailed description).
|
\cite[p.~5]{kleinDNSCacheBasedUser2019} for a detailed description).
|
||||||
|
|
||||||
\begin{enumerate}
|
\begin{enumerate}
|
||||||
|
|||||||
BIN
chapters/titlepage.pdf
Normal file
BIN
chapters/titlepage.pdf
Normal file
Binary file not shown.
11
main.tex
11
main.tex
@ -19,6 +19,7 @@
|
|||||||
\usepackage{xr}
|
\usepackage{xr}
|
||||||
\usepackage[acronym]{glossaries}
|
\usepackage[acronym]{glossaries}
|
||||||
\usepackage{lastpage}
|
\usepackage{lastpage}
|
||||||
|
\usepackage{pdfpages}
|
||||||
|
|
||||||
\glsenablehyper
|
\glsenablehyper
|
||||||
|
|
||||||
@ -87,8 +88,15 @@
|
|||||||
|
|
||||||
\input{abbrev/acronym.tex}
|
\input{abbrev/acronym.tex}
|
||||||
|
|
||||||
|
\includepdf[pages=-]{chapters/titlepage.pdf}
|
||||||
|
|
||||||
|
\newpage
|
||||||
|
|
||||||
\pagenumbering{roman}
|
\pagenumbering{roman}
|
||||||
|
|
||||||
|
\subfile{chapters/erklaerung.tex}
|
||||||
|
\thispagestyle{frontmatter}
|
||||||
|
|
||||||
\subfile{chapters/abstract-de}
|
\subfile{chapters/abstract-de}
|
||||||
\thispagestyle{frontmatter}
|
\thispagestyle{frontmatter}
|
||||||
|
|
||||||
@ -104,7 +112,8 @@
|
|||||||
\listoflistings
|
\listoflistings
|
||||||
\thispagestyle{frontmatter}
|
\thispagestyle{frontmatter}
|
||||||
|
|
||||||
\printglossaries
|
\printglossary
|
||||||
|
\printglossary[type=\acronymtype]
|
||||||
\thispagestyle{frontmatter}
|
\thispagestyle{frontmatter}
|
||||||
|
|
||||||
\subfile{chapters/introduction}
|
\subfile{chapters/introduction}
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user