86 lines
5.3 KiB
TeX
86 lines
5.3 KiB
TeX
\chapter{Introduction}
|
|
\label{chap:introduction}
|
|
|
|
The Internet has seen an unprecedented rise in traffic over the last few years
|
|
which is accelerating still. Due to this growth, an increasing amount of user
|
|
data is sent over the Internet. This user data is analyzed by companies in big
|
|
industries such as social networking, advertising, internet service providers
|
|
and news web sites. Although many services online appear to be free for
|
|
individual users, the companies behind them have to sustain themselves and make
|
|
profits every year. This has led to firms working extensively with user data to
|
|
extract meaningful information from the way users use their services. The
|
|
collected and inferred information can then be sold to interested parties which
|
|
allows those parties to personalize their service, yielding higher customer
|
|
engagement and thus higher profits. The end users themselves receive the short
|
|
end of the stick by---often unconsciously---giving away their data without
|
|
gaining much in turn. Because the means of data collection on the Internet are
|
|
becoming increasingly invasive and omnipresent, tools to defend against such
|
|
privacy intrusions are developed. It is beneficial to users to know how web
|
|
sites are tracking their customers so that they can protect themselves against
|
|
these tracking mechanisms. The aim of this thesis is to give an overview of
|
|
tracking methods and tools to defend oneself against them. It seeks to answer
|
|
the underlying research question of \emph{Which stateful tracking methods are
|
|
used to track individuals on the Internet and which countermeasures exist?}
|
|
|
|
\section{Terms and Scope}
|
|
\label{sec:terms and scope}
|
|
|
|
This thesis will focus on web tracking as employed by for example advertising
|
|
companies. When users visit a web site which uses third party content from
|
|
advertisers, those advertisers collect bits of information about the user. These
|
|
bits of information are not yet associated with a particular user but with an
|
|
online identity which is usually tied to a unique identifier. The unique
|
|
identifiers are by themselves not meaningful because the same user might get
|
|
multiple unique identifiers, each corresponding to other bits of information. To
|
|
allow the series of information to be aggregated into one profile which
|
|
approximates a user's personality, needs and wants, tracking mechanisms are
|
|
used. In many cases the goal is to persist tracking identifiers on the user's
|
|
computer for as long as possible and to not assign multiple identifiers to the
|
|
same person.
|
|
|
|
The tracking mechanisms presented in this work are mechanisms which store
|
|
information on the user's computer. They are---in other words---\emph{stateful}
|
|
mechanisms. Such mechanisms include \gls{HTTP} cookies or various forms of
|
|
caches. Contrary to stateful mechanisms, \emph{stateless} mechanisms do not
|
|
store information on the user's computer but attempt to infer information by
|
|
reading the browser state. This can mean knowing which fonts are installed and
|
|
inferring that a particular user is using a Windows operating system instead of
|
|
Linux or that they are visiting with a mobile browser and not from a desktop.
|
|
This type of tracking is also called \emph{device fingerprinting}. With enough
|
|
fingerprints, trackers can uniquely identify a user or device by knowing that no
|
|
other entity uses the Internet with the same unique fingerprint. Stateless
|
|
tracking mechanisms are not discussed in this work, instead the focus will be on
|
|
stateful tracking mechanisms.
|
|
|
|
\section{Methodology}
|
|
\label{sec:methodology}
|
|
|
|
This work gives an overview of tracking methods and defenses which have been
|
|
studied in the literature. As such, a comprehensive literature review of
|
|
relevant research is performed, with a focus on recent developments. Papers will
|
|
be collected through the usage of digital libraries and search engines such as
|
|
the \emph{ACM Digital Library}, the \emph{IEEE Xplore Library}, \emph{Google
|
|
Scholar} and for selected works to appear in peer-reviewed journals
|
|
\emph{arXiv.org}. Additionally, well-known journals and proceedings like
|
|
\emph{Computers \& Security} and \emph{Proceedings on Privacy Enhancing
|
|
Technologies} are manually searched for relevant papers. The used search terms
|
|
include but are not limited to keywords such as \emph{Stateful Web Tracking},
|
|
\emph{Web Tracking}, \emph{Tracking Measurement} and variants thereof.
|
|
Furthermore, queries for the names of particular tracking methods are made. For
|
|
information on \emph{Cookie Synchronization} (section~\ref{subsec:cookie
|
|
synchronization}) for instance, separate search queries will be performed.
|
|
|
|
\section{Structure of the Thesis}
|
|
\label{sec:structure of the thesis}
|
|
|
|
The thesis is divided into two major parts: chapter~\ref{chap:tracking methods}
|
|
is concerned with how web sites on the Internet track individuals and
|
|
chapter~\ref{chap:defenses against tracking} offers users ways to defend
|
|
themselves against those tracking methods. Chapter~\ref{chap:tracking methods}
|
|
is split into three parts, each focussing on a subset of tracking methods that
|
|
can be grouped together. The chapter on defenses against tracking first presents
|
|
ways in which users can use existing browser features to limit tracking. The
|
|
second part discusses specialized tools which focus on one aspect of tracking
|
|
and summarizes research concerned with the effectiveness of these tools. The
|
|
thesis is concluded in chapter~\ref{chap:conclusion}.
|