288 lines
18 KiB
TeX
288 lines
18 KiB
TeX
\chapter{Defenses against Tracking}%
|
|
\label{chap:defenses against tracking}
|
|
|
|
The proliferation of tracking across the web has led to the development of a
|
|
myriad of tools that each have their own advantages and disadvantages. Some
|
|
tracking methods can be easily mitigated by changing browser settings or by
|
|
disabling certain technologies. More often than not, these methods not only stop
|
|
or limit tracking but also severely hamper the internet experience for end
|
|
users. Especially some of the more advanced tools require user input to know
|
|
which items to block and which to let through. This in turn requires expertise
|
|
that few regular internet users possess, further complicating defending against
|
|
tracking. This chapter introduces methods and tools that have been proven to be
|
|
effective against tracking on the web. It is split into two parts, with the
|
|
first surveying techniques that can be applied to limit tracking and the second
|
|
presenting tools to managing tracking on the web. The focus lies on defending
|
|
against the methods discussed in chapter~\ref{chap:tracking methods}.
|
|
|
|
\section{Techniques}
|
|
\label{sec:techniques}
|
|
|
|
The aim of this section is to present comparatively simple techniques that a
|
|
user can employ to limit tracking. The benefit of these methods is that they are
|
|
built into modern browsers and therefore do not require specific user knowledge
|
|
of installing any additional tools. Although their implementations vary from
|
|
one browser to another, the basic idea of the underlying functionality remains
|
|
the same.
|
|
|
|
\subsection{Opt-out and Opt-in}
|
|
\label{subsec:opt-out}
|
|
|
|
To opt-out in the context of web tracking means to make use of the possibility
|
|
of turning off data collection by a web site. After the user has opted-out of
|
|
either all data collection or only a subset of all the data that a web site
|
|
collects, an opt-out cookie is set, indicating the user's preference. Whereas
|
|
opting-out generally means that data collection happens by default, opt-in
|
|
requires that data collection is turned off by default. In theory it allows
|
|
users to have fine-grained control over which aspects of their online presence
|
|
they are comfortable with sharing by either opting-out or opting-in (depending
|
|
on how web sites ask for consent). In practice however, the seemingly irrelevant
|
|
difference between those two lead to very different outcomes with respect to the
|
|
amount of users that are tracked.
|
|
|
|
For either opt-out or opt-in to work, a web site has to provide an option for
|
|
doing so. Because web sites increasingly use third parties to manage data
|
|
collection on their site, consent or rejection has to be passed to these third
|
|
parties and they have to be willing to accept such a decision. Since the
|
|
European's \gls{GDPR} came into force in 2018, service providers operating in
|
|
the European Union are required to ask users for explicit consent before
|
|
collecting any data, except when that data is absolutely necessary to ensure
|
|
basic functionality. It is not allowed to notify the user that by continuing to
|
|
visit the web site, consent to data collection is given. Furthermore, if consent
|
|
is not given, the web site provider is not allowed to block the user from
|
|
visiting the web site. Even before the \gls{GDPR}, the EU required web sites to
|
|
ask for informed consent via the ePrivacy Directive which came into force in
|
|
2013. \citet{trevisanYearsEUCookie2019} use their tool \emph{CookieCheck} to
|
|
evaluate how many of the surveyed 35.000 sites comply with the legislation put
|
|
forth in the ePrivacy Directive. Their findings indicate that almost half (49\%)
|
|
of the web sites use profiling technologies without consent. Similarly,
|
|
\citet{sanchez-rolaCanOptOut2019a} show that tracking is still prevalent and
|
|
happens already before user consent is given after the \gls{GDPR} has been in
|
|
force for a year. \citet{huCharacterisingThirdParty2019} come to a a similar
|
|
conclusion while only looking at third party tracking: the amount of cookies
|
|
stored on a user's computer has not changed significantly since before the
|
|
\gls{GDPR}. In yet another survey of the top 500 web sites as ranked by Alexa,
|
|
\citet{degelingWeValueYour2019} conclude that the amount of tracking before and
|
|
after the \gls{GDPR} stayed the same and only 37 sites ask for consent before
|
|
storing any cookies.
|
|
|
|
Giving users a choice whether they want to share their personal information or
|
|
not and given that web sites honor such a request, all of the methods discussed
|
|
in chapter~\ref{chap:tracking methods} can be defended against.
|
|
|
|
\subsection{Clearing Browser History}
|
|
\label{subsec:Clearing Browser History}
|
|
|
|
For our purposes, clearing the browser history means not only clearing the web
|
|
sites that have been visited but also cookies and other relevant data that is
|
|
saved with a visit to a web site. All major browser offer this function and what
|
|
they delete is similar. Firefox for example allows clearing the browsing and
|
|
search history, form and search history, cookies (also flash cookies), the
|
|
cache, active logins, offline web site data and site preferences such as
|
|
permissions, zoom level and character encodings. This technique is only
|
|
beneficial in the long term if users do it frequently to stop any accumulation
|
|
of tracking identifiers in caches, cookies or other site data. The downside is
|
|
that not having a history to go back to can hamper user experience depending on
|
|
the workflow of each user. Futhermore, opt-out or opt-in preferences are deleted
|
|
as well, making the technique in section~\ref{subsec:opt-out} less effective.
|
|
|
|
Clearing the browser history is effective against some storage-based tracking
|
|
methods. Evercookie (section~\ref{subsec:evercookie}) and cookie synchronisation
|
|
(section~\ref{subsec:cookie synchronization}) are designed to respawn items in
|
|
the browser history and can therefore not be mitigated. Almost all cache-based
|
|
methods are also mitigated by frequently clearing the browser history as long as
|
|
users do not authenticate themselves with a web service.
|
|
\citet{kleinDNSCacheBasedUser2019} demonstrate that their \gls{DNS} cache attack
|
|
works across history deletions. Session-based methods are not affected by
|
|
history clearing because they are intended to track a user for one session only.
|
|
|
|
\subsection{Private Browsing Mode}
|
|
\label{subsec:Private Browsing Mode}
|
|
|
|
The private browsing mode is a feature offered by all major browser that intends
|
|
to improve privacy by not allowing access to storage areas within the browser.
|
|
Users associate it with an increase of privacy compared to normal or public
|
|
mode. Unfortunately, implementations of the private browsing mode are
|
|
inconsistent across browsers and what is deemed worthy of protection is largely
|
|
up to browser vendors. \citet[p.~440]{xuUCognitoPrivateBrowsing2015} provide a
|
|
comprehensive overview of browsers and their private browsing mode practices.
|
|
Most notably, Safari allows access to earlier cookies, history and HTML5 storage
|
|
while other browsers disallow it. Table~\ref{tab:private browsing mode} provides
|
|
a list of browsers and their protection against tracking when in private
|
|
browsing mode with the methods from chapter~\ref{chap:tracking methods}.
|
|
|
|
\begin{sidewaystable}
|
|
\caption{Private browsing mode for major browsers}
|
|
\label{tab:private browsing mode}
|
|
\centering
|
|
\begin{tabular}{|l|l|c|c|c|c|}
|
|
\hline
|
|
\multicolumn{1}{|c|}{\textbf{Section}} & \multicolumn{1}{c|}{\textbf{Tracking Method}} & \multicolumn{4}{c|}{ \textbf{Tracking in Private Browsing Mode}} \\
|
|
\hline
|
|
\multicolumn{2}{|l|}{} & \textbf{Safari} & \textbf{Firefox} & \textbf{Chrome} & \textbf{IE} \\
|
|
\hline
|
|
\multicolumn{6}{|l|}{\textbf{Session-based} } \\
|
|
\hline
|
|
\ref{subsec:passing information in urls} & Passing Information in URLs & NA & NA & NA & NA \\
|
|
\hline
|
|
\ref{subsec:hidden form fields} & Hidden Form Fields & NA & NA & NA & NA \\
|
|
\hline
|
|
\ref{subsec:http referer} & HTTP Referer & NA & NA & NA & NA \\
|
|
\hline
|
|
\ref{subsec:explicit authentication} & Explicit Authentication & NA & NA & NA & NA \\
|
|
\hline
|
|
\ref{subsec:window.name dom property} & window.name DOM property & NA & NA & NA & NA \\
|
|
\hline
|
|
\multicolumn{6}{|l|}{\textbf{Storage-based} } \\
|
|
\hline
|
|
\ref{subsec:http cookies} & HTTP cookies & Yes & No & No & No \\
|
|
\hline
|
|
\ref{subsec:flash cookies and java jnlp persistenceservice} & Flash Cookies and Java JNLP PersistenceService & Yes & Yes & Yes & Yes \\
|
|
\hline
|
|
\ref{subsec:evercookie} & Evercookie & Yes & No & No & No \\
|
|
\hline
|
|
\ref{subsec:cookie synchronization} & Cookie Synchronization & Yes & Yes & Yes & Yes \\
|
|
\hline
|
|
\ref{subsec:silverlight isolated storage} & Silverlight Isolated Storage & Yes & No & No & No \\
|
|
\hline
|
|
\ref{subsec:html5 web storage} & HTML5 Web Storage & Yes & No & No & No \\
|
|
\hline
|
|
\ref{subsec:html5 indexed database api} & HTML5 Indexed Database API & Yes & No & No & No \\
|
|
\hline
|
|
\ref{subsec:web sql database} & Web SQL Database & Yes & No & No & No \\
|
|
\hline
|
|
\multicolumn{6}{|l|}{\textbf{Cache-based} } \\
|
|
\hline
|
|
\ref{subsec:web cache} & Web Cache & Yes & No & No & No \\
|
|
\hline
|
|
\ref{subsec:cache timing} & Cache Timing & Yes & No & No & No \\
|
|
\hline
|
|
\ref{subsec:cache control directives} & Cache Control Directives & Yes & No & No & No \\
|
|
\hline
|
|
\ref{subsec:dns cache} & DNS Cache & Yes & Yes & Yes & Yes \\
|
|
\hline
|
|
\ref{subsec:tls session resumption} & TLS Session Resumption & Yes & No & No & No \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{sidewaystable}
|
|
|
|
\subsection{Do Not Track}
|
|
\label{subsec:Do Not Track}
|
|
|
|
\gls{DNT} \cite{w3cTrackingPreferenceExpression2019} is a header field that
|
|
browsers can send along with the \gls{HTTP} header to indicate that the user
|
|
prefers to not be tracked or prefers to allow tracking. All major browsers have
|
|
implemented it and offer the user the possibility of sending the header with
|
|
every request. Since its inception in 2011, adoption by trackers has been slow
|
|
to a point where \gls{DNT} is considered to be deprecated and development of the
|
|
standard has halted. Originally, it was intended to be the main way of
|
|
opting-out of tracking but without tracker compliance, it slowly faded into
|
|
obscurity.
|
|
|
|
Due to its voluntary nature and slow to no adoption, \gls{DNT} does not provide
|
|
any protection against any of the tracking methods discussed in
|
|
chapter~\ref{chap:tracking methods} in practice. Indeed,
|
|
\citet{englehardtCookiesThatGive2015} show that the \gls{DNT} header field does
|
|
not influence the level of tracking a user experiences at all. For \gls{DNT} to
|
|
be effective, the ad-scape would have to change in a way that users see
|
|
advertisements as a necessary factor in keeping the Internet `free' and trackers
|
|
respect a user's choice to not want to be tracked.
|
|
|
|
\subsection{Privacy-focused Search Engines}
|
|
\label{subsec:Privacy-focused Search Engines}
|
|
|
|
Using privacy-focused search engines is often the first step in protecting a
|
|
users privacy. Search is a cornerstone of the Internet and thus almost every
|
|
user searches for something upon opening the browser. With every search request,
|
|
the search engine can infer information about the user which gets added to a
|
|
profile. This profile is then used to enable personalized search results. Users
|
|
trying to protect their privacy by using other search engines than the default
|
|
ones (Google, Bing, Yahoo, Baidu, \dots), might find themselves in a dilemma.
|
|
Personalized search results usually provide better relevant results overall and
|
|
switching to a privacy-focused search engine, which usually has a smaller market
|
|
share, might lead to less relevant results. With Google having a market share of
|
|
almost 92\% as of June 2020 \cite{statcounterSearchEngineMarket}, users may find
|
|
that Google's search results are better than everyone else's, making a switch to
|
|
other search engines particularly difficult. Despite the market dominance of
|
|
Google, smaller, privacy-focused search engines such as DuckDuckGo
|
|
\cite{DuckDuckGoa} and Startpage \cite{StartpageCom} exist. Although those
|
|
search engines claim to not collect any personal information, these claims
|
|
cannot be verified easily and thus users have to trust them. Other open source
|
|
solutions such as searx \cite{tauberAsciimooSearx2020} can be self-hosted by
|
|
users with enough expertise and therefore eliminate the need to trust big search
|
|
engine providers. As is the case with searx, metasearch engines do not crawl the
|
|
Internet on their own but aggregate results from different search engines.
|
|
|
|
The benefit of using privacy-focused search engines is that they obfuscate the
|
|
\gls{HTTP} Referer field (see section~\ref{subsec:http referer}) by not
|
|
forwarding search results to the linked website. Additionally, they often
|
|
abstain from showing adverts on result pages, protecting user data from third
|
|
parties that seek to monetize it.
|
|
|
|
\section{Tools}
|
|
\label{sec:tools}
|
|
|
|
This section focuses on external tools that can either be installed as a plugin
|
|
within the browser or as a standalone program. Specific user knowledge is only
|
|
necessary in some cases when users want to have fine-grained control over their
|
|
data sharing preferences.
|
|
|
|
\subsection{Blacklists}
|
|
\label{subsec:blacklists}
|
|
|
|
Blacklists are a central component of tracking protection on the Web. They block
|
|
requests from web sites that are on the blacklist and are known for their
|
|
tracking purposes. Only third party requests are blocked by blacklists because
|
|
blocking first parties would result in those web sites not being accessible at
|
|
all. Blacklists usually start out as small lists of manually selected web sites.
|
|
Over time and as their user base grows, more and more web sites are added,
|
|
resulting in a good first defense against tracking on moderately popular web
|
|
sites. The effectiveness of \glspl{TPL} depends on how quickly new domains
|
|
belonging to trackers are added to the list and when old, supposedly inactive,
|
|
domains are removed again. Futhermore, modern browser plugins aggregate
|
|
multiple, independently maintained blocklists into one big blacklist, improving
|
|
the overall detection rate. Since some lists are aimed at blocking for example
|
|
cryptocurrency mining applications on websites and others at regular third party
|
|
requests, knowledgeable users can customize their blocking preferences by only
|
|
including those lists that they deem necessary. A well-known list used by
|
|
popular browser plugins such as Adblock Plus \cite{Adblock} and uBlock Origin
|
|
\cite{hillGorhillUBlock2020} is EasyList \cite{EasyList}. This list is used as a
|
|
basis and additional lists are added by both browser plugins.
|
|
|
|
\citet{merzdovnikBlockMeIf2017} provide an evaluation of different browser
|
|
plugins (Adblock Plus, disconnect, ghostery, privacy badger and uBlock Origin)
|
|
and their tracking protection capabilities. They identify three approaches to
|
|
curating rulesets that are then used by these plugins. Adblock Plus and uBlock
|
|
Origin rely on EasyList and its additional subscriptions which are
|
|
\emph{community-driven}. Here, the community maintains the blocklists and
|
|
updates are monitored through a public repository. Ghostery and disconnect use
|
|
blocklists that are curated by a \emph{centralized} entity such as a company. In
|
|
Ghostery's case, the centralized entity is Cliqz GmbH. Centralized entities
|
|
raise the question of how they are funding themselves especially when the
|
|
application they develop has been released to the open source community. The
|
|
third approach works by curating blocklists \emph{algorithmically}. Privacy
|
|
Badger, developed by the \gls{EFF}, does not maintain a regularly updated
|
|
blocklist but instead relies on heuristics to detect third party tracking.
|
|
|
|
In their survey of about 120,000 web sites, \citet{merzdovnikBlockMeIf2017} find
|
|
that the most popular choice Adblock Plus blocks the least amount of requests by
|
|
third parties. Additionally, their results indicate that centralized blocklists
|
|
are more effective than community-driven ones in reducing the number of requests
|
|
to third parties. Algorithmic approaches such as Privacy Badger lead to a
|
|
comparatively high number of web site timeouts. Furthermore, Privacy Badger does
|
|
not perform well on analytics.
|
|
|
|
In general, using blacklists can be very effective against every form of
|
|
tracking that relies on third party requests. As soon as a first party performs
|
|
the same tracking that the third party does, blacklists do not provide any
|
|
protection.
|
|
|
|
\subsection{TOR}
|
|
\label{subsec:tor}
|
|
|
|
\subsection{Virtual Private Networks}
|
|
\label{subsec:virtual private networks}
|
|
|
|
\subsection{Request Policy}
|
|
\label{subsec:Request Policy}
|