diff --git a/abbrev/acronym.tex b/abbrev/acronym.tex index 2bcf982..fd671c0 100644 --- a/abbrev/acronym.tex +++ b/abbrev/acronym.tex @@ -23,3 +23,4 @@ \newacronym {ISP} {ISP} {Internet Service Provider} \newacronym {SQL} {SQL} {Structured Query Language} \newacronym {CDN} {CDN} {Content Delivery Network} +\newacronym {ETag} {ETag} {Entity Tag} diff --git a/chapters/methods.tex b/chapters/methods.tex index fd7d887..87b7097 100644 --- a/chapters/methods.tex +++ b/chapters/methods.tex @@ -779,7 +779,34 @@ attack. Cache Control Directives can be supplied in the Cache-Control \gls{HTTP} header, allowing rules about storing, updating and deletion of resources in the cache to -be defined. +be defined. Cache Control Directives make heavy use of \emph{\glspl{ETag}} and +\emph{Last-Modified \gls{HTTP} Headers} to determine whether a cached resource +is stale and needs to be updated. Commonly, a collision-resistant hash function +is used to generate a unique hash of a cached resource which is sent along with +the resource in the first \gls{HTTP} request. The resource and the hash—which is +stored in the \gls{ETag} header—is then cached by the client. On subsequent +retrievals of the same \gls{URL}, the client checks for an expire date on the +requested \gls{URL} via the Cache-Control and Expire headers. If the \gls{URL} +has expired, the client sends a request with the \emph{If-None-Match} field set +with the \gls{ETag}. The server then compares the \gls{ETag} received by the +client with the generated \gls{ETag} of the resource on the server side. If the +two values match (i.e., the resource has not changed), the server can send back +an \gls{HTTP} 304 Not-Modified status. Otherwise, the answer contains a full +\gls{HTTP} response with the modified resource and the newly generated +\gls{ETag}, which the client can cache again. Usage of \glspl{ETag} can +therefore improve performance and cache consistency while at the same time +reducing bandwidth usage. + +As with most other tracking methods, unique identifiers can be stored inside the +\gls{ETag} header because it offers a storage capacity of 81864 bits. Once the +identifier has been placed in the \gls{ETag} header, the server can answer +requests to check for an updated resource always with an \gls{HTTP} 301 +Not-Modified header, effectively persisting the unique identifier in the +client's cache. During their 2011 survey of QuantCast.com's top 100 U.S. based +websites \citeauthor{ayensonFlashCookiesPrivacy2011} found \texttt{hulu.com} to +be using \glspl{ETag} as backup for tracking cookies that are set by +\texttt{KISSmetrics} (an analytics platform). This allowed cookies to be +respawned once they had been cleared by checking the \gls{ETag} header. \subsection{DNS Cache} \label{subsec:dns cache}