\chapter{Defenses against Tracking}% \label{chap:defenses against tracking} The proliferation of tracking across the web has led to the development of a myriad of tools that each have their own advantages and disadvantages. Some tracking methods can be easily mitigated by changing browser settings or by disabling certain technologies. More often than not, these methods not only stop or limit tracking but also severely hamper the internet experience for end users. Especially some of the more advanced tools require user input to know which items to block and which to let through. This in turn requires expertise that few regular internet users possess, further complicating defending against tracking. This chapter introduces methods and tools that have been proven to be effective against tracking on the web. It is split into two parts, with the first surveying techniques that can be applied to limit tracking and the second presenting tools to managing tracking on the web. The focus lies on defending against the methods discussed in chapter~\ref{chap:tracking methods}. \section{Techniques} \label{sec:techniques} The aim of this section is to present comparatively simple techniques that a user can employ to limit tracking. The benefit of these methods is that they are built into modern browsers and therefore do not require specific user knowledge of installing any additional tools. Although their implementations vary from one browser to another, the basic idea of the underlying functionality remains the same. \subsection{Opt-out and Opt-in} \label{subsec:opt-out} To opt-out in the context of web tracking means to make use of the possibility of turning off data collection by a web site. After the user has opted-out of either all data collection or only a subset of all the data that a web site collects, an opt-out cookie is set, indicating the user's preference. Whereas opting-out generally means that data collection happens by default, opt-in requires that data collection is turned off by default. In theory it allows users to have fine-grained control over which aspects of their online presence they are comfortable with sharing by either opting-out or opting-in (depending on how web sites ask for consent). In practice however, the seemingly irrelevant difference between those two lead to very different outcomes with respect to the amount of users that are tracked. For either opt-out or opt-in to work, a web site has to provide an option for doing so. Because web sites increasingly use third parties to manage data collection on their site, consent or rejection has to be passed to these third parties and they have to be willing to accept such a decision. Since the European's \gls{GDPR} came into force in 2018, service providers operating in the European Union are required to ask users for explicit consent before collecting any data, except when that data is absolutely necessary to ensure basic functionality. It is not allowed to notify the user that by continuing to visit the web site, consent to data collection is given. Furthermore, if consent is not given, the web site provider is not allowed to block the user from visiting the web site. Even before the \gls{GDPR}, the EU required web sites to ask for informed consent via the ePrivacy Directive which came into force in 2013. \citet{trevisanYearsEUCookie2019} use their tool \emph{CookieCheck} to evaluate how many of the surveyed 35.000 sites comply with the legislation put forth in the ePrivacy Directive. Their findings indicate that almost half (49\%) of the web sites use profiling technologies without consent. Similarly, \citet{sanchez-rolaCanOptOut2019a} show that tracking is still prevalent and happens already before user consent is given after the \gls{GDPR} has been in force for a year. \citet{huCharacterisingThirdParty2019} come to a a similar conclusion while only looking at third party tracking: the amount of cookies stored on a user's computer has not changed significantly since before the \gls{GDPR}. In yet another survey of the top 500 web sites as ranked by Alexa, \citet{degelingWeValueYour2019} conclude that the amount of tracking before and after the \gls{GDPR} stayed the same and only 37 sites ask for consent before storing any cookies. Giving users a choice whether they want to share their personal information or not and given that web sites honor such a request, all of the methods discussed in chapter~\ref{chap:tracking methods} can be defended against. \subsection{Clearing Browser History} \label{subsec:Clearing Browser History} For our purposes, clearing the browser history means not only clearing the web sites that have been visited but also cookies and other relevant data that is saved with a visit to a web site. All major browser offer this function and what they delete is similar. Firefox for example allows clearing the browsing and search history, form and search history, cookies (also flash cookies), the cache, active logins, offline web site data and site preferences such as permissions, zoom level and character encodings. This technique is only beneficial in the long term if users do it frequently to stop any accumulation of tracking identifiers in caches, cookies or other site data. The downside is that not having a history to go back to can hamper user experience depending on the workflow of each user. Futhermore, opt-out or opt-in preferences are deleted as well, making the technique in section~\ref{subsec:opt-out} less effective. Clearing the browser history is effective against some storage-based tracking methods. Evercookie (section~\ref{subsec:evercookie}) and cookie synchronisation (section~\ref{subsec:cookie synchronization}) are designed to respawn items in the browser history and can therefore not be mitigated. Almost all cache-based methods are also mitigated by frequently clearing the browser history as long as users do not authenticate themselves with a web service. \citet{kleinDNSCacheBasedUser2019} demonstrate that their \gls{DNS} cache attack works across history deletions. Session-based methods are not affected by history clearing because they are intended to track a user for one session only. \subsection{Private Browsing Mode} \label{subsec:Private Browsing Mode} The private browsing mode is a feature offered by all major browser that intends to improve privacy by not allowing access to storage areas within the browser. Users associate it with an increase of privacy compared to normal or public mode. Unfortunately, implementations of the private browsing mode are inconsistent across browsers and what is deemed worthy of protection is largely up to browser vendors. \citet[p.~440]{xuUCognitoPrivateBrowsing2015} provide a comprehensive overview of browsers and their private browsing mode practices. Most notably, Safari allows access to earlier cookies, history and HTML5 storage while other browsers disallow it. Table~\ref{tab:private browsing mode} provides a list of browsers and their protection against tracking when in private browsing mode with the methods from chapter~\ref{chap:tracking methods}. \begin{sidewaystable} \caption{Private browsing mode for major browsers} \label{tab:private browsing mode} \centering \begin{tabular}{|l|l|c|c|c|c|} \hline \multicolumn{1}{|c|}{\textbf{Section}} & \multicolumn{1}{c|}{\textbf{Tracking Method}} & \multicolumn{4}{c|}{ \textbf{Tracking in Private Browsing Mode}} \\ \hline \multicolumn{2}{|l|}{} & \textbf{Safari} & \textbf{Firefox} & \textbf{Chrome} & \textbf{IE} \\ \hline \multicolumn{6}{|l|}{\textbf{Session-based} } \\ \hline \ref{subsec:passing information in urls} & Passing Information in URLs & NA & NA & NA & NA \\ \hline \ref{subsec:hidden form fields} & Hidden Form Fields & NA & NA & NA & NA \\ \hline \ref{subsec:http referer} & HTTP Referer & NA & NA & NA & NA \\ \hline \ref{subsec:explicit authentication} & Explicit Authentication & NA & NA & NA & NA \\ \hline \ref{subsec:window.name dom property} & window.name DOM property & NA & NA & NA & NA \\ \hline \multicolumn{6}{|l|}{\textbf{Storage-based} } \\ \hline \ref{subsec:http cookies} & HTTP cookies & Yes & No & No & No \\ \hline \ref{subsec:flash cookies and java jnlp persistenceservice} & Flash Cookies and Java JNLP PersistenceService & Yes & Yes & Yes & Yes \\ \hline \ref{subsec:evercookie} & Evercookie & Yes & No & No & No \\ \hline \ref{subsec:cookie synchronization} & Cookie Synchronization & Yes & Yes & Yes & Yes \\ \hline \ref{subsec:silverlight isolated storage} & Silverlight Isolated Storage & Yes & No & No & No \\ \hline \ref{subsec:html5 web storage} & HTML5 Web Storage & Yes & No & No & No \\ \hline \ref{subsec:html5 indexed database api} & HTML5 Indexed Database API & Yes & No & No & No \\ \hline \ref{subsec:web sql database} & Web SQL Database & Yes & No & No & No \\ \hline \multicolumn{6}{|l|}{\textbf{Cache-based} } \\ \hline \ref{subsec:web cache} & Web Cache & Yes & No & No & No \\ \hline \ref{subsec:cache timing} & Cache Timing & Yes & No & No & No \\ \hline \ref{subsec:cache control directives} & Cache Control Directives & Yes & No & No & No \\ \hline \ref{subsec:dns cache} & DNS Cache & Yes & Yes & Yes & Yes \\ \hline \ref{subsec:tls session resumption} & TLS Session Resumption & Yes & No & No & No \\ \hline \end{tabular} \end{sidewaystable} \subsection{Do Not Track} \label{subsec:Do Not Track} \gls{DNT} \cite{w3cTrackingPreferenceExpression2019} is a header field that browsers can send along with the \gls{HTTP} header to indicate that the user prefers to not be tracked or prefers to allow tracking. All major browsers have implemented it and offer the user the possibility of sending the header with every request. Since its inception in 2011, adoption by trackers has been slow to a point where \gls{DNT} is considered to be deprecated and development of the standard has halted. Originally, it was intended to be the main way of opting-out of tracking but without tracker compliance, it slowly faded into obscurity. Due to its voluntary nature and slow to no adoption, \gls{DNT} does not provide any protection against any of the tracking methods discussed in chapter~\ref{chap:tracking methods} in practice. Indeed, \citet{englehardtCookiesThatGive2015} show that the \gls{DNT} header field does not influence the level of tracking a user experiences at all. For \gls{DNT} to be effective, the ad-scape would have to change in a way that users see advertisements as a necessary factor in keeping the Internet `free' and trackers respect a user's choice to not want to be tracked. \subsection{Privacy-focused Search Engines} \label{subsec:Privacy-focused Search Engines} Using privacy-focused search engines is often the first step in protecting a users privacy. Search is a cornerstone of the Internet and thus almost every user searches for something upon opening the browser. With every search request, the search engine can infer information about the user which gets added to a profile. This profile is then used to enable personalized search results. Users trying to protect their privacy by using other search engines than the default ones (Google, Bing, Yahoo, Baidu, \dots), might find themselves in a dilemma. Personalized search results usually provide better relevant results overall and switching to a privacy-focused search engine, which usually has a smaller market share, might lead to less relevant results. With Google having a market share of almost 92\% as of June 2020 \cite{statcounterSearchEngineMarket}, users may find that Google's search results are better than everyone else's, making a switch to other search engines particularly difficult. Despite the market dominance of Google, smaller, privacy-focused search engines such as DuckDuckGo \cite{DuckDuckGoa} and Startpage \cite{StartpageCom} exist. Although those search engines claim to not collect any personal information, these claims cannot be verified easily and thus users have to trust them. Other open source solutions such as searx \cite{tauberAsciimooSearx2020} can be self-hosted by users with enough expertise and therefore eliminate the need to trust big search engine providers. As is the case with searx, metasearch engines do not crawl the Internet on their own but aggregate results from different search engines. The benefit of using privacy-focused search engines is that they obfuscate the \gls{HTTP} Referer field (see section~\ref{subsec:http referer}) by not forwarding search results to the linked website. Additionally, they often abstain from showing adverts on result pages, protecting user data from third parties that seek to monetize it. \section{Tools} \label{sec:tools} This section focuses on external tools that can either be installed as a plugin within the browser or as a standalone program. Specific user knowledge is only necessary in some cases when users want to have fine-grained control over their data sharing preferences. \subsection{Blacklists} \label{subsec:blacklists} Blacklists are a central component of tracking protection on the Web. They block requests from web sites that are on the blacklist and are known for their tracking purposes. Only third party requests are blocked by blacklists because blocking first parties would result in those web sites not being accessible at all. Blacklists usually start out as small lists of manually selected web sites. Over time and as their user base grows, more and more web sites are added, resulting in a good first defense against tracking on moderately popular web sites. The effectiveness of \glspl{TPL} depends on how quickly new domains belonging to trackers are added to the list and when old, supposedly inactive, domains are removed again. Futhermore, modern browser plugins aggregate multiple, independently maintained blocklists into one big blacklist, improving the overall detection rate. Since some lists are aimed at blocking for example cryptocurrency mining applications on websites and others at regular third party requests, knowledgeable users can customize their blocking preferences by only including those lists that they deem necessary. A well-known list used by popular browser plugins such as Adblock Plus \cite{Adblock} and uBlock Origin \cite{hillGorhillUBlock2020} is EasyList \cite{EasyList}. This list is used as a basis and additional lists are added by both browser plugins. \citet{merzdovnikBlockMeIf2017} provide an evaluation of different browser plugins (Adblock Plus, disconnect, ghostery, privacy badger and uBlock Origin) and their tracking protection capabilities. They identify three approaches to curating rulesets that are then used by these plugins. Adblock Plus and uBlock Origin rely on EasyList and its additional subscriptions which are \emph{community-driven}. Here, the community maintains the blocklists and updates are monitored through a public repository. Ghostery and disconnect use blocklists that are curated by a \emph{centralized} entity such as a company. In Ghostery's case, the centralized entity is Cliqz GmbH. Centralized entities raise the question of how they are funding themselves especially when the application they develop has been released to the open source community. The third approach works by curating blocklists \emph{algorithmically}. Privacy Badger, developed by the \gls{EFF}, does not maintain a regularly updated blocklist but instead relies on heuristics to detect third party tracking. In their survey of about 120,000 web sites, \citet{merzdovnikBlockMeIf2017} find that the most popular choice Adblock Plus blocks the least amount of requests by third parties. Additionally, their results indicate that centralized blocklists are more effective than community-driven ones in reducing the number of requests to third parties. Algorithmic approaches such as Privacy Badger lead to a comparatively high number of web site timeouts. Furthermore, Privacy Badger does not perform well on analytics. In general, using blacklists can be very effective against every form of tracking that relies on third party requests. As soon as a first party performs the same tracking that the third party does, blacklists do not provide any protection. \subsection{TOR} \label{subsec:tor} \subsection{Virtual Private Networks} \label{subsec:virtual private networks} \subsection{Request Policy} \label{subsec:Request Policy}