Review text so far, add titlepage and erklaerung

2020-03-31 17:49:05 +02:00 · 2020-03-31 17:49:05 +02:00 · 3eb33ae783
commit 3eb33ae783
parent 50e9bdac73
5 changed files with 100 additions and 57 deletions
--- a/.gitignore
+++ b/.gitignore
@ -18,3 +18,9 @@ main.pdf
 main.run.xml
 main.synctex.gz
 main.toc
 main.acr
 main.alg
 main.glg
 main.gls
 main.ilg
 main.ind
--- a/chapters/erklaerung.tex
+++ b/chapters/erklaerung.tex
@ -0,0 +1,29 @@
 \documentclass[../main.tex]{subfiles}
 \begin{document}
 \chapter*{Erklärung zur Verfassung der Arbeit}
 \textsf{Tobias Eidelpes} \\
 Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich
 die verwendeten Quellen und Hilfsmittel vollständig angegeben habe und dass
 ich die Stellen der Arbeit---einschließlich Tabellen, Karten und Abbildungen---,
 die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen
 sind, auf jeden Fall unter Angabe der Quelle als Entlehnung kenntlich gemacht habe.
 \vspace{2cm}
 \bigskip
 \begin{minipage}{0.55\textwidth}
 	\textsf{Wien, 31. März 2020} \\
 \end{minipage}
 \begin{minipage}{0.45\textwidth}
 \begin{tabular}{c}
 \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ \\
 \textsf{Tobias Eidelpes}
 \end{tabular}
 \end{minipage}
 \end{document}
--- a/chapters/methods.tex
+++ b/chapters/methods.tex
@ -27,7 +27,7 @@ identifiers.
 \section{Session-based Tracking Methods}
 \label{sec:session-based tracking methods}
-One of the simplest and most used forms of tracking on the Internet rely on
+One of the simplest and most used forms of tracking on the Internet relies on
 sessions. Since HTTP is a stateless protocol, web servers cannot by default keep
 track of any previous client requests. In order to implement specific features
 such as personalized advertising, some means to save current and recall previous
@ -94,12 +94,12 @@ web \cite{westMeasuringPrivacyDisclosures2014}.
 \subsection{Hidden Form Fields}
 \label{subsec:hidden form fields}
-The \gls{HTML} provides a specification for form elements, which allow users to
+The \gls{HTML} provides a specification for form elements, which allows users to
 submit information (e.g., for authentication) to the server via POST or GET
 methods. Normally, a user would input data into a form and on clicking
 \emph{submit} the input would be sent to the server. Sometimes it is necessary
 to include additional information that the user did not enter. For this reason
-there exist \emph{hidden} web forms. Hidden web forms do not show on the website
+there exist \emph{hidden} web forms. Hidden web forms do not show on the web site
 and therefore the user cannot enter any information. Similar to \gls{URL}
 parameters, the value parameter in a hidden field contains additional
 information like the user's preferred language for example. Since almost
@ -126,7 +126,7 @@ is sent to the server along with the data the user has filled in.
 \subsection{HTTP Referer}
 \label{subsec:http referer}
-Providers of web services often want to know where visitors to their website
+Providers of web services often want to know where visitors to their web site
 come from to understand more about their users and their browsing habits. The
 \gls{HTTP} specification accounts for this by introducing the \emph{\gls{HTTP}
 Referer field} [\emph{sic}] in the header. By checking the referrer, the server
@ -147,7 +147,7 @@ identifiability of users on the web.
 \label{subsec:explicit authentication}
 Explicit authentication requires a user to \emph{explicitly} log in or register
-to the website. This way, specific resources are only available to the user when
+to the web site. This way, specific resources are only available to the user when
 he or she has authenticated themselves to the service. Actions taken on an
 authenticated user account are tied to that account and crafting a personal
 profile is more or less a built-in function in this case. Since merely asking a
@ -167,12 +167,12 @@ efforts are not detected by the average user \cite{}, it is known that actions
 taken on an account are logged to provide better service through service
 optimization and profile personalization.
-Making an account on a website to use their services to their full extent, can
+Making an account on a web site to use their services to their full extent, can
 be beneficial in some cases. Facebook for example, allows their users to
 configure what they want to share with the public and their friends. Research
 has shown however, that managing which posts get shown to whom is not as
 straightforward as one might think.
-\todo{Wrong chapter?} \citeauthor{liuAnalyzingFacebookPrivacy2011}
+\citeauthor{liuAnalyzingFacebookPrivacy2011}
 \cite{liuAnalyzingFacebookPrivacy2011} conducted a survey where they asked
 Facebook users about their desired privacy and visibility settings and
 cross-checked them with the actual settings they have used for their posts. The
@ -236,11 +236,11 @@ A method which is most often associated with tracking on the Internet is
 tracking with \gls{HTTP} cookies. Cookies are small files that are placed in the
 browser's storage on the user's computer. They are limited to four kilobytes in
 size and are generally used to identify and authenticate users and to store
-website preferences. They were introduced to the web to allow stateful
+web site preferences. They were introduced to the web to allow stateful
 information to be stored because the \gls{HTTP} is a stateless protocol and
 therefore does not have this capability. It is also a way of reducing the
 server's load by not having to recompute states every time a user visits a
-website. Shopping cart functionality for example can thus be implemented by
+web site. Shopping cart functionality for example can thus be implemented by
 setting a cookie in the user's browser, saving the items which are currently
 added to the shopping cart and giving the user the possibility to resume
 shopping at a later point provided that they do not delete their cookies. With
@ -279,7 +279,7 @@ soon as the session is `torn down'. By adding an expiration date (demonstrated
 in Listing~\ref{lst:permanent cookie header}) or a maximum age, the cookie
 becomes permanent. Additionally, the domain attribute can be specified, meaning
 that cookies which list a different domain than the origin, are rejected by the
-user agent \cite[Section 4.1.2.3]{barthHTTPStateManagement2011}. The same-origin
+user agent \cite[section 4.1.2.3]{barthHTTPStateManagement2011}. The same-origin
 policy applies to cookies, disallowing access by other domains.
 \begin{listing}
@ -308,7 +308,7 @@ Additionally, a length of more than 35 characters in the value field applies to
 80\% of non-tracking cookies. \emph{Cookie Chunking}, where a cookie of larger
 length is split into multiple cookies with smaller length, did not appear to
 affect detection by their method negatively. They also present a site
-measurement of the Alexa Top 10,000 websites, finding that 46\% of websites use
+measurement of the Alexa Top 10,000 web sites, finding that 46\% of web sites use
 third party tracking. More recent research
 \cite{gonzalezCookieRecipeUntangling2017} has shown that tracking cookies do not
 have to be long lasting to accumulate data about users. Some cookies---like the
@ -332,7 +332,7 @@ detect and block cookies (see chapter~\ref{chap:defences against tracking}).
 \label{subsec:flash cookies and java jnlp persistenceservice}
 Flash Cookies are similar to HTTP cookies in that they too are a store of
-information that helps websites and servers to recognize already seen users.
+information that helps web sites and servers to recognize already seen users.
 They are referred to as \glspl{LSO} by Adobe and are part of the Adobe Flash
 Player runtime. Instead of storing data in the browser's storage, they have
 their own storage in a different location on the user's computer. Another
@ -352,11 +352,11 @@ posed by \gls{HTTP} cookies and reacted by taking countermeasures.
 \citeauthor{soltaniFlashCookiesPrivacy2009}
 \cite{soltaniFlashCookiesPrivacy2009} were the first to report on the usage of
-Flash cookies by advertisers and popular websites. While surveying the top 100
+Flash cookies by advertisers and popular web sites. While surveying the top 100
-websites at the time, they found that 54\% of them used Flash cookies. Some
+web sites at the time, they found that 54\% of them used Flash cookies. Some
-websites were setting Flash cookies as well as \gls{HTTP} cookies with the same
+web sites were setting Flash cookies as well as \gls{HTTP} cookies with the same
 values, suggesting that Flash cookies serve as backup to \gls{HTTP} cookies.
-Several websites were found using Flash cookies to respawn already deleted
+Several web sites were found using Flash cookies to respawn already deleted
 \gls{HTTP} cookies, even across domains. \citeauthor{acarWebNeverForgets2014}
 \cite{acarWebNeverForgets2014} automated detecting Flash cookies and access to
 them by monitoring file access with the GNU/Linux \emph{strace} tool
@ -367,10 +367,10 @@ top 100 sites use Flash cookies for respawning.
 Even though Flash usage has declined during the last few years thanks to the
 development of the HTML5 standard, \citeauthor{buhovFLASH20thCentury2018}
 \cite{buhovFLASH20thCentury2018} have shown that despite major security flaws,
-Flash content is still served by 7.5\% of the top one million websites (2017).
+Flash content is still served by 7.5\% of the top one million web sites (2017).
 The W3Techs Web Technology Survey shows a similar trend and also offers an
-up-to-date measurement of 2.7\% of the top ten million websites for the year
+up-to-date measurement of 2.7\% of the top ten million web sites for the year
-2020 \cite{w3techsHistoricalYearlyTrends2020}. Due to the security concerns in
+2020 \cite{w3techsHistoricalYearlyTrends2020}. Due to the security concerns with
 using Flash, Google's popular video sharing platform YouTube switched by default
 to the HTML5 <video> tag in January of 2015
 \cite{youtubeengineeringYouTubeNowDefaults2015}. In 2017 Adobe announced that they
@ -389,13 +389,13 @@ injecting a Java applet into the \gls{DOM} of a page
 \subsection{Evercookie}
 \label{subsec:evercookie}
-Evercookie is JavaScript code that can be embedded in websites which allows to
+Evercookie is JavaScript code that can be embedded in web sites which allows to
 permanently store information on the user's computer. When activated,
 information is not only stored in standard \gls{HTTP} cookies but also in
 various other places, providing redundancy where possible. A full list of
 locations used by Evercookie can be found on the project's github page
 \cite{kamkarSamykEvercookie2020}. In case the user wants to get rid of all
-information stored by visiting a website that uses evercookies, every location
+information stored by visiting a web site that uses evercookies, every location
 has to be cleared because if one remains, all the other cookies are restored.
 The cookie deletion mechanisms that are provided by browsers by default do not
 clear all locations where evercookies are stored, which makes evercookie almost
@ -422,7 +422,7 @@ ways to accurately match an accumulated profile history of one identifier to
 another. This problem has been solved by modern trackers by using a mechanism
 called Cookie Synchronization or Cookie Matching. This technique allows multiple
 trackers to open an information sharing channel between each other without
-necessarily having to know the website the user visits.
+necessarily having to know the web site the user visits.
 \begin{figure}[ht]
 	\centering
@ -436,14 +436,14 @@ An example of how Cookie Synchronization works in practice is given in
 Figure~\ref{fig:cookie synchronization}. The two parties that are interested in
 tracking the user are called \emph{cloudflare.com} and \emph{google.com} in this
 example. The user they want to track is called \emph{browser}. \emph{Browser}
-first visits \emph{website1.com} which loads JavaScript from
+first visits \emph{web site1.com} which loads JavaScript from
 \emph{cloudflare.com}. \emph{Cloudflare.com} sets a cookie in the browser with a
 tracking identifier called \emph{userID = 1234}. Next, \emph{browser} visits
-another website called \emph{website2.com} which loads an advertisement banner
+another web site called \emph{web site2.com} which loads an advertisement banner
 from \emph{google.com}. \emph{Google.com} also sets a cookie with the tracking
 identifier \emph{userID = ABCD}. \emph{Browser} has now two cookies from two
 different providers, each of them knowing the user under a different identifier.
-When \emph{browser} visits a third website called \emph{website3.com} which
+When \emph{browser} visits a third web site called \emph{website3.com} which
 makes a request to \emph{cloudflare.com} and recognizes the user with the
 identifier \emph{userID = 1234}, \emph{cloudflare.com} sends an \gls{HTTP}
 redirect, redirecting \emph{browser} to \emph{google.com}. The redirect also
@ -481,7 +481,7 @@ top 1000 (46\%) use Cookie Synchronization with at least one other party.
 parties. \citeauthor{papadopoulosExclusiveHowSynced2018} show in
 \cite{papadopoulosExclusiveHowSynced2018} the threat that Cookie Synchronization
 poses to encrypted \gls{TLS} sessions by performing the cookie-syncing over
-unencrypted \gls{HTTP} even though the original request to the website was
+unencrypted \gls{HTTP} even though the original request to the web site was
 encrypted. This highlights the serious privacy implications for users of
 \gls{VPN} services trying to safeguard their traffic from a potentially
 malicious \gls{ISP}.
@ -499,11 +499,11 @@ settings in the Silverlight application. Silverlight's Isolated Storage is one
 of the methods evercookie (section~\ref{subsec:evercookie}) uses to make
 permanent deletion of cookies hard to do and to facilitate cookie respawning.
 Usage of Silverlight has seen a steady decline since 2011 even though it has
-been used by popular video streaming websites such as Netflix
+been used by popular video streaming web sites such as Netflix
 \cite{NetflixBeginsRollOut2010} and Amazon. Microsoft did not include
 Silverlight support in Windows 8 and declared end-of-life in a blog post for
 October of 2021 \cite{SilverlightEndSupport2015}. Usage of Silverlight currently
-hovers around 0.04\% for the top 10 million websites
+hovers around 0.04\% for the top 10 million web sites
 \cite{w3techsUsageStatisticsSilverlight2020}.
 \subsection{HTML5 Web Storage}
@ -529,7 +529,7 @@ applications. Due to it violating the same-origin policy, most major browsers
 have not implemented Global Storage.
 Local Storage does, however, obey the same-origin policy by only allowing the
-originating domain access to its name-value pairs. Every website has their own
+originating domain access to its name-value pairs. Every web site has their own
 separate storage area which maintains a clear separation of concerns. Local
 Storage lends itself for different use cases. Especially applications that
 should function even when no internet connection exists can use Local Storage to
@ -558,7 +558,7 @@ tracking domains.
 \label{subsec:html5 indexed database api}
 The need for client side storage to provide performant web applications that can
-also function offline, has prompted the inception of alternative methods to
+also function offline has prompted the inception of alternative methods to
 store and retrieve information. Consequently, the development of the HTML5
 standard has tried to fill that need by introducing HTML5 Web Storage and the
 HTML5 Indexed Database \gls{API}.
@ -618,7 +618,7 @@ section~\ref{subsec:evercookie}) to add another layer of redundancy for storing
 unique identifiers and respawning deleted ones. By performing static analysis on
 a dataset provided by the \gls{HTTP} Archive project
 \cite{soudersAnnouncingHTTPArchive2011}, \citeauthor{belloroKnowWhatYou2018}
-found that 1.34\% of the surveyed websites use Web SQL Database in one of their
+found that 1.34\% of the surveyed web sites use Web SQL Database in one of their
 subresources. 53.59\% of Web SQL Database usage are considered to be coming from
 known tracking domains. This ratio is lower for the first 10K web sites as
 determined by Alexa (in May 2018): 2.12\% use Web SQL Database and 39.9\% of
@ -641,20 +641,19 @@ A variety of caches exist and they are utilized for different purposes, leading
 to different forms of information exploitability for tracking users. This
 section introduces methods which are in most cases not prevalent but are more
 sophisticated and can thus be much harder to circumvent or block.
 \todo{Insert structure}
 \subsection{Web Cache}
 \label{subsec:web cache}
 Using the \gls{DOM} \gls{API}'s \texttt{Window.getComputedStyle()} method,
-websites were able to check a user's browsing history by utilizing the \gls{CSS}
+web sites were able to check a user's browsing history by utilizing the \gls{CSS}
 \texttt{:visited} selector. Links can be coloured depending on whether they have
-already been visited or not. The colours can be set by the website trying to
+already been visited or not. The colours can be set by the web site trying to
 find out what the user's browsing history is. JavaScript would then be used to
-generate links on the fly for websites that will be cross-checked with the
+generate links on the fly for web sites that will be cross-checked with the
 contents of the browsing history. After generating links, a script can check the
 colour, compare it with the colour that has been set for visited and non-visited
-websites and see if a website has already been visited or not.
+web sites and see if a web site has already been visited or not.
 A solution to the problem has been proposed and subsequently implemented by
 \citeauthor{baronPreventingAttacksUser2010}
@ -680,14 +679,14 @@ attributed to a single user but to a group as a whole can be used to more
 accurately identify members of said group.
 Other ways of utilizing a web browser's cache to track users are tracking
-whether a website asset (e.g., an image or script) has already been cached by
+whether a web site asset (e.g., an image or script) has already been cached by
-the user agent or not. If it has been cached, the website knows that is has been
+the user agent or not. If it has been cached, the web site knows that is has been
 visited before and if it has not been cached (the asset is downloaded from the
 server), the user agent visits for the first time. Another way is to embed
 identifiers in cached documents. An \gls{HTML} file can contain an identifier
 which is stored in a \texttt{<div>} tag and is cached by the user agent. The
 identifier can then be read from the cache on subsequent visits, even from third
-party websites.
+party web sites.
 \subsection{Cache Timing}
 \label{subsec:cache timing}
@ -699,18 +698,18 @@ cryptography to indirectly observe the generation or usage of a cipher key by
 measuring cpu noises, frequencies, power usage or other properties that allow
 conclusions to be drawn about the key. This type of attack is referred to as a
 side-channel attack. Cache timing exploits the fact that it takes time to load
-assets for a website. It works by measuring the time a client takes to access a
+assets for a web site. It works by measuring the time a client takes to access a
 specified resource. If the time is short, the resource has most likely been
 served from the cache and has thus been downloaded before, implying a visit to a
-website which uses that resource. If it takes longer than a cache hit would, on
+web site which uses that resource. If it takes longer than a cache hit would, on
 the other hand, the resource did not exist before and has to be downloaded now,
-suggesting that no other website using that resource has been visited before. In
+suggesting that no other web site using that resource has been visited before. In
 practice an attack might look like this (taken from
 \cite[p.~2]{feltenTimingAttacksWeb2000}):
 \begin{enumerate}
-	\item Alice visits a website from Bob called \texttt{bob.com}.
+	\item Alice visits a web site from Bob called \texttt{bob.com}.
-	\item Bob wants to find out whether Alice visited Charlie's website
+	\item Bob wants to find out whether Alice visited Charlie's web site
 		\texttt{charlie.com} in the past.
 	\item Bob chooses a file from \texttt{charlie.com} which is regularly
 		downloaded by visitors to that site.
@ -725,7 +724,7 @@ practice an attack might look like this (taken from
 \end{enumerate}
 Bob can do this process for multiple resources and for every user that visits
-his website, collecting browser history information on all of them. Since
+his web site, collecting browser history information on all of them. Since
 caches exist to boost performance and avoid unnecessary loading of content from
 servers which has already been downloaded before, timing attacks are very hard
 to circumvent because caches exist solely for that purpose. Countermeasures
@ -741,13 +740,13 @@ miss performance and turning off Java and JavaScript but concluded that they
 were unattractive or at worst ineffective. They propose a partial remedy for
 cache timing by introducing \emph{Domain Tagging} which requires that resources
 are tagged with the domain they have initially been loaded from. Once another
-website wants to determine whether a user has visited a site before by
+web site wants to determine whether a user has visited a site before by
 cross-loading a resource, the domain does not match the tagged domain on the
 resource. If that is the case, the initial cache hit gets transformed into a
 cache miss and the resource has to be downloaded again, fooling the attacker
-into believing that the origin website has not been visited before. It is
+into believing that the origin web site has not been visited before. It is
 necessary to mention that at the time (2000) \glspl{CDN} were not as widely
-used as today. Since websites rely on \glspl{CDN} to cache resources that are
+used as today. Since web sites rely on \glspl{CDN} to cache resources that are
 used on multiple sites and can thus be served much faster from cache, domain
 tagging would effectively nullify the performance boost a \gls{CDN} provides by
 converting every cache hit into a cache miss. The authors themselves question
@ -768,10 +767,10 @@ discussed so far has not tackled the problem through a quantitative perspective
 but instead focused on individual cases. Due to this missing piece,
 \citeauthor{sanchez-rolaBakingTimerPrivacyAnalysis2019}
 \cite{sanchez-rolaBakingTimerPrivacyAnalysis2019} conducted a survey on 10K
-websites to determine how feasible it is to perform a history sniffing attack on
+web sites to determine how feasible it is to perform a history sniffing attack on
 a large scale. Their tool \textsc{BakingTimer} collects timing information on
 \gls{HTTP} requests, checking for logged in status and sensitive data. Their
-results show that 71.07\% of the surveyed websites are vulnerable to the
+results show that 71.07\% of the surveyed web sites are vulnerable to the
 attack.
 \subsection{Cache Control Directives}
@ -803,7 +802,7 @@ identifier has been placed in the \gls{ETag} header, the server can answer
 requests to check for an updated resource always with an \gls{HTTP} 301
 Not-Modified header, effectively persisting the unique identifier in the
 client's cache. During their 2011 survey of QuantCast.com's top 100 U.S. based
-websites \citeauthor{ayensonFlashCookiesPrivacy2011}
+web sites \citeauthor{ayensonFlashCookiesPrivacy2011}
 \cite{ayensonFlashCookiesPrivacy2011} found \texttt{hulu.com} to be using
 \glspl{ETag} as backup for tracking cookies that are set by \texttt{KISSmetrics}
 (an analytics platform). This allowed cookies to be respawned once they had been
@ -830,11 +829,11 @@ own cache (e.g., browsers).
 \citeauthor{kleinDNSCacheBasedUser2019} \cite{kleinDNSCacheBasedUser2019}
 demonstrated a tracking method which is using \gls{DNS} caches to assign unique
 identifiers to client machines. In order for the technique to work, the tracker
-has to have control over a web server as well as an authoritative \gls{DNS}
+has to have control over one web server (or multiple) as well as an
-server which associates the web servers with a domain name under the control of
+authoritative \gls{DNS} server which associates the web servers with a domain
-the tracker. The tracking process starts once a user agent requests a web site
+name under the control of the tracker. The tracking process starts once a user
-which loads a script from one of the web servers the attacker is controlling.
+agent requests a web site which loads a script from one of the web servers the
-The process can then be sketched out as follows (see
+attacker is controlling.  The process can then be sketched out as follows (see
 \cite[p.~5]{kleinDNSCacheBasedUser2019} for a detailed description).
 \begin{enumerate}
--- a/chapters/titlepage.pdf
+++ b/chapters/titlepage.pdf
--- a/main.tex
+++ b/main.tex
@ -19,6 +19,7 @@
 \usepackage{xr}
 \usepackage[acronym]{glossaries}
 \usepackage{lastpage}
 \usepackage{pdfpages}
 \glsenablehyper
@ -87,8 +88,15 @@
    \input{abbrev/acronym.tex}
    \includepdf[pages=-]{chapters/titlepage.pdf}
    \newpage
    \pagenumbering{roman}
    \subfile{chapters/erklaerung.tex}
    \thispagestyle{frontmatter}
    \subfile{chapters/abstract-de}
    \thispagestyle{frontmatter}
@ -104,7 +112,8 @@
    \listoflistings
    \thispagestyle{frontmatter}
-    \printglossaries
+    \printglossary
    \printglossary[type=\acronymtype]
    \thispagestyle{frontmatter}
    \subfile{chapters/introduction}