\chapter{Introduction} \label{chap:introduction} The Internet has seen an unprecedented rise in traffic over the last few years which is accelerating still. Due to this growth, an increasing amount of user data is sent over the Internet. This user data is analyzed by companies in big industries such as social networking, advertising, internet service providers and news web sites. Although many services online appear to be free for individual users, the companies behind them have to sustain themselves and make profits every year. This has led to firms working extensively with user data to extract meaningful information from the way users use their services. The collected and inferred information can then be sold to interested parties which allows those parties to personalize their service, yielding higher customer engagement and thus higher profits. The end users themselves receive the short end of the stick by---often unconsciously---giving away their data without gaining much in turn. Because the means of data collection on the Internet are becoming increasingly invasive and omnipresent, tools to defend against such privacy intrusions are developed. It is beneficial to users to know how web sites are tracking their customers so that they can protect themselves against these tracking mechanisms. The aim of this thesis is to give an overview of tracking methods and tools to defend oneself against them. It seeks to answer the underlying research question of: \emph{Which stateful tracking methods are used to track individuals on the Internet and which countermeasures exist?} \section{Terms and Scope} \label{sec:terms and scope} This thesis will focus on web tracking as employed by for example advertising companies. When users visit a web site which uses third party content from advertisers, those advertisers collect bits of information about the user. These bits of information are not yet associated with a particular user but with an online identity which is usually tied to a unique identifier. The unique identifiers are by themselves not meaningful because the same user might get multiple unique identifiers, each corresponding to other bits of information. To allow the series of information to be aggregated into one profile which approximates a user's personality, needs and wants, tracking mechanisms are used. In many cases the goal is to persist tracking identifiers on the user's computer for as long as possible and to not assign multiple identifiers to the same person. The tracking mechanisms presented in this work are mechanisms which store information on the user's computer. They are---in other words---\emph{stateful} mechanisms. Such mechanisms include \gls{HTTP} cookies or various forms of caches. Contrary to stateful mechanisms, \emph{stateless} mechanisms do not store information on the user's computer but attempt to infer information by reading the browser state. This can mean knowing which fonts are installed and inferring that a particular user is using a Windows operating system instead of Linux or that they are visiting with a mobile browser and not from a desktop. This type of tracking is also called \emph{device fingerprinting}. With enough fingerprints, trackers can uniquely identify a user or device by knowing that no other entity uses the Internet with the same unique fingerprint. Stateless tracking mechanisms are not discussed in this work, instead the focus will be on stateful tracking mechanisms. \section{State of the Art} \label{sec:state of the art} Tracking methods have been actively researched in the past. Survey papers commonly focus on one aspect of the tracking ecosystem. \citet{mayerThirdPartyWebTracking2012} focus on third party tracking and policy surrounding this issue. \citet{cahnEmpiricalStudyWeb2016} and \citet{englehardtCookiesThatGive2015} research cookies and their impact on privacy online. \citet{chaabaneBigFriendWatching2012} focus on how social networks are able to track users. \citet{roesnerDetectingDefendingThirdparty2012} study the tracking ecosystem and potential defenses built into the browser. \citet{belloroKnowWhatYou2018} discuss new methods and the extent to which they are used for tracking purposes. \citet{schelterUbiquityWebTracking2016}, \citet{englehardtOnlineTracking1MillionSite2016} and \citet{gonzalezCookieRecipeUntangling2017} perform a large-scale analysis of third party tracking using one million and more web sites. \citet{papadopoulosCookieSynchronizationEverything2019} are concerned with cookie synchronization and its impact on tracking on the web. \citet{yuTrackingTrackers2016} survey popular browser addons for their tracking protection capabilities and \citet{xuUCognitoPrivateBrowsing2015} focus on the private browsing mode implemented by most modern browsers. Because browsers often do not provide the necessary protection, \citet{vlajicAnonymityTORUsers2017} take a closer look at the Tor Network and Browser. \citet{merzdovnikBlockMeIf2017} are concerned with blocking tools such as \emph{AdblockPlus} as defense against tracking on the Internet. \section{Methodology} \label{sec:methodology} This work gives an overview of tracking methods and defenses which have been studied in the literature. As such, a comprehensive literature review of relevant research is performed, with a focus on recent developments. Papers will be collected through the usage of digital libraries and search engines such as the \emph{ACM Digital Library}, the \emph{IEEE Xplore Library}, \emph{Google Scholar} and for selected works to appear in peer-reviewed journals \emph{arXiv.org}. Additionally, well-known journals and proceedings like \emph{Computers \& Security} and \emph{Proceedings on Privacy Enhancing Technologies} are manually searched for relevant papers. The used search terms include but are not limited to keywords such as \emph{Stateful Web Tracking}, \emph{Web Tracking}, \emph{Tracking Measurement} and variants thereof. Furthermore, queries for the names of particular tracking methods are made. For information on \emph{Cookie Synchronization} (section~\ref{subsec:cookie synchronization}) for instance, separate search queries will be performed. \section{Structure of the Thesis} \label{sec:structure of the thesis} The thesis is divided into two major parts: chapter~\ref{chap:tracking methods} is concerned with how web sites on the Internet track individuals and chapter~\ref{chap:defenses against tracking} offers users ways to defend themselves against those tracking methods. Chapter~\ref{chap:tracking methods} is split into three parts, each focussing on a subset of tracking methods that can be grouped together. The chapter on defenses against tracking first presents ways in which users can use existing browser features to limit tracking. The second part discusses specialized tools which focus on one aspect of tracking and summarizes research concerned with the effectiveness of these tools. The thesis is concluded in chapter~\ref{chap:conclusion}.