privacy-seminar/privacy-seminar.tex

\documentclass[a4paper,11pt]{article}

\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\usepackage[pdftex]{graphicx}
\usepackage[margin=1.25in]{geometry}
\usepackage{fancyhdr}
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{units}
\usepackage{hyperxmp}
\usepackage[hyphens]{url}
\usepackage{csquotes}
\usepackage[pdftex, colorlinks=true,
					citecolor=black, filecolor=black,
					linkcolor=black, urlcolor=black,
					pdfproducer={LaTeX}, pdfcreator={LaTeX}]{hyperref}

\usepackage{color}
\usepackage{soul}
\pagestyle{fancy}

\fancyhf{}
\rhead{Privacy-Friendly Location-Based Services}
\lhead{Radboud University}
\rfoot{Page \thepage}
\setlength{\headheight}{14pt}

\usepackage[
backend=biber,
style=numeric,
sorting=none,
]{biblatex}

\usepackage{titling}
\newcommand{\subtitle}[1]{%Subtitle to add group number
  \posttitle{%
    \par\end{center}
    \begin{center}\large#1\end{center}}%
}

\addbibresource{privacy-seminar.bib} %Imports bibliography file
\setlength{\droptitle}{-10em}   % This is your set screw
% \setlength{\parindent}{0pt}
% \setlength{\parskip}{1ex plus 0.5ex minus 0.2ex}

\title{\line(1,0){350}\\\textbf{\Large{Privacy-Friendly Location-Based Services}} \line(1,0){350}\\} % Article title
\author{%
\small{\textbf{Christina Kreza}} \\
\footnotesize Radboud University \\
\footnotesize \href{mailto:christina.kreza@ru.nl}{christina.kreza@ru.nl}
\and
\small{\textbf{Sotiris Michaelides}} \\
\footnotesize Radboud University \\
\footnotesize \href{mailto:sotiris.michaelides@ru.nl}{sotiris.michaelides@ru.nl}
\and
\small{\textbf{Mark Juvan}} \\
\footnotesize Radboud University \\
\footnotesize \href{mailto:mark.juvan@ru.nl}{mark.juvan@ru.nl}
\and
\small{\textbf{Tobias Eidelpes}} \\
\footnotesize Radboud University \\
\footnotesize \href{mailto:tobias.eidelpes@ru.nl}{tobias.eidelpes@ru.nl}
\and
\small{\textbf{Giwrgos Baroutas}} \\
\footnotesize Radboud University \\
\footnotesize \href{mailto:giwrgos.baroutas@ru.nl}{giwrgos.baroutas@ru.nl}
}
\date{\small{\today}}
%\renewcommand{\maketitlehookd}{%
%\begin{abstract}
%\noindent Lorem Ipsum is simply a dummy text from the printing and typesetting industry. Lorem ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
%\end{abstract}
%}

\begin{document}
\maketitle

\section{Introduction}

Location-based services (LBSs) have grown in popularity in recent years, primarily due to the increasing availability and development of modern mobile devices such as smartphones and tablets. Any technology that relies on real-time location tracking to function, identifies the user's physical and geographical position in order to offer a variety of services, like advertising, billing, infotainment, navigation and security. Common specific examples include geo-social networks like Foursquare that offer searching, checking in and sharing location services, precise weather forecast services like Dark Sky and even point of interest finders such as Gasbuddy that provide real-time fuel price information, station locations and offerings.

Although these services are very useful by making people's lives easier and more convenient, their usage can also pose severe privacy concerns if the location of the users are not adequately protected. For example, disclosing precise user locations may allow an adversary to infer sensitive privacy information about the user, such as their home locations, political/religious beliefs and even health conditions.  Moreover, these data can be misused for massive surveillance, blackmailing, stalking, controlling and manipulating purposes.

Consumers, as well as service providers, have been paying close attention to this issue. Users are concerned about the misuse of their private location information and want more control over it. At the same time, because LBSs cannot be successfully marketed unless customers are comfortable using them, service providers have also significant incentives to alleviate customers' privacy concerns. As a result, location privacy principles have become important in order to ensure that users will continue to accept useful LBSs without trading off their private location information.

Researchers have long been aware of the possible privacy dangers associated with LBSs, and have developed a variety of interesting strategies to help users protect their privacy. In this paper, we present a comprehensive review of existing techniques and implementations that assist privacy-friendly LBSs.

The remainder of this paper is structured as follows: First, we introduce some risks and benefits of LBSs in Section~2. In Section~3, we identify the privacy threats of LBSs and the threat models used in LBS privacy protection research. In Section~4, we present the existing privacy metrics and their importance. Then, in Section~5 we examine three implementations of privacy-friendly LBSs, namely Uniform Grid Caching, Hiding in the Mobile Crowd and Privacy-preserving Electronic Toll Pricing system. In Section~6, we present a comprehensive classification of location privacy attacks. Finally, in Section~7 we discuss our findings and state our opinion on the aforementioned privacy-enhancing technologies.

\section{Risks and benefits of LBSs}

In this section, we first present the most common uses of LBSs. Then we analyze some clear benefits to both individual customers, service providers and society, but also some privacy concerning risks.

\subsection{Uses of LBSs}

Big tech companies and LBS providers have found many ways to use device location information in order to provide their services. The most common are the following.

\begin{itemize}
    \item \textbf{Navigation and travel information}: A LBS can give real-time information to a smartphone, tablet or any navigator such as traffic updates, congestion reports or even weather forecasts.
    \item \textbf{Store and service locators}: These services enable retail customers to locate the nearest store location and service according to their location.
    \item \textbf{Proximity-based marketing}: This implies that local businesses can target advertisements to people in their immediate vicinity.
    \item \textbf{Fleet and mobile workforce management tracking}: This type of LBS allows logistics-dependent organisations' employees to check in at predefined locations using their mobile devices in the field.
    \item \textbf{Inventory monitoring}: An integrated inventory and asset monitoring help businesses keep their equipment safe.
    \item \textbf{Fraud/theft prevention}: Credit card security issues can be reduced by using LBSs. An LBS can add a layer of protection by connecting a customer's location to a credit card transaction.
    \item \textbf{Augmented reality}: Location-based augmented reality technology offers an experience that is relevant to the user's location in order to enhance the user satisfaction and engagement.
\end{itemize}

\subsection{Benefits of LBSs}

From the perspective of a customer, LBSs clearly offer an improved and personalised customer experience. Instead of irritating end-users with generic advertisements and marketing offers, LBSs can provide them with relevant information and services. LBSs also provide an additional sense of security in times of emergency. For example, if there is an accident, ambulance services can respond immediately, and even people with health issues can be monitored and tracked for their safety. Furthermore, LBSs assist consumers in making the most of their travel experience and minimize fraud or theft risk.

Although not so clear, LBSs can offer many more benefits to service providers and companies that utilise those services. Businesses that use LBSs to identify their customers' buying patterns can acquire detailed insights into customer behaviour. Information about how many people are using a service or buying a product, how often they are doing that, and whether or not they take advantage of the offered discounts, assist businesses in profiling their customers and targeting them accordingly. After properly understanding the demands and requirements of their target consumers, companies may also link their LBSs to a mobile social network, allowing users to share information about the service or a product with their friends and other contacts. Companies may profit tremendously from this since it helps develop their user database and improves brand exposure. LBSs can also help companies maximize their marketing effort to increase the number of customers by informing new users near the business location about new offers, discounts and special events.

From a societal point of view, many studies on epidemics, demographics and healthcare field, using aggregated location data, showed that there are some clear benefits that LBSs can offer. According to a research~\cite{van2015real} on the possible impact of using location monitoring technology in clinical settings, healthcare facilities can be significantly improved. Real-time location services can be used to improve hospital administration. This technology may improve patient care by improving the amount of time clinical personnel can spend with patients, reducing expenses, and monitoring facility resources and security, all of which increase overall efficiency and profitability. Another research~\cite{luo2018large} demonstrates the efficacy of vaccination tactics based on geo-social interaction patterns in controlling an epidemic outbreak. This study eventually gave evidence to help public health organizations select optimal scales for disease control measures.

\subsection{Risks of LBSs}

Although LBSs have introduced innovative and profitable services and applications, they pose concerning risks to the users, service providers and other companies that use these collected data. When an individual utilizes LBSs, numerous data controllers may be present behind the scene such as the actual service provider, data brokers and/or third party vendors. This raises a number of issues for consumers, including questions about how their location data will be used, with whom the data will be shared and if the data will be collected and analysed further. All these intrusive business practises about the collection and misuse of data can lead to more offensive and manipulative marketing tactics as the profiling of customers becomes more and more personalised. Criminals can also use location data, in conjunction with other personal information, to identify an individual's location, enabling further criminal activities like theft, burglary, kidnapping, stalking and blackmailing.

From a business point of view, a challenging problem lies in balancing the functionality and efficiency of location-based data with the privacy and ethical concerns of employees and costumers. Employees face the issue that their employers use geolocation data to track them to evaluate their efforts. Even for justifiable reasons, these techniques are as intrusive as the offensive marketing strategies used on customers. At the same time, location-based information might also be detrimental to the business. For example, the knowledge that a group of employees or executives are at a specific location can disclose confidential information such as planning, processes, operations and style of work that can give a competitive advantage to business rivals behavior.

Concerns have also been voiced on the impact of the collection, classification and further use of location-based data to society. Many businesses are interested in acquiring location data even if their services do not require it, since selling such data to other businesses is an appealing and profitable proposition. This location-based data market has become so extensive in modern society that even governments are involved in the most dystopian and invasive way. Governmental organizations and agencies are participating in this location-based data market with justifiable intentions like security and safety of their citizens, but also for massive surveillance purposes, which raises another challenging task in balancing personal privacy with public safety and security.

\section{Threat models}

In order to evaluate the Privacy Enhancing Technologies (PETs) that we will analyse further on, we first need to understand what the attackers might want to get from the users and what means they have at their disposal in order to achieve their goals. To do so, we will identify three main types of threats related to user location privacy that arise from disclosed location information, as described in \cite{GAJJAR2017223}.

First, we can identify \emph{tracking threats}. In this type of threats, the attacker receives real-time updates concerning the user's location and uses this information to identify and predict the user's future travel routes with sufficient accuracy. In addition, the attacker can extract frequent trips by analysing mobility patterns.

Moreover, we have the \emph{identification threats}. This threat can be triggered by receiving sporadic updates about the user's location. This type of threat can lead to the attacker being able to identify the user's frequently visited locations, such as their home or work place. Eventually, getting access to these locations can lead to the disclosure of a user's identity.

Lastly, we have \emph{profiling threats} where the attacker might not have the available information to extract the user's identity but is still able to create a profile, adding their shopping habits or visits to religious or hospital establishments.

\section{Privacy metrics} % Tobias

Creators of privacy preserving LBSs want to be able to quantify the level of privacy a certain feature is able to achieve. Consumers of said privacy services also need such quantitative and qualitative measures to determine which services offer the most protection. Privacy is oftentimes a very abstract concept and does not lend itself easily to comparisons across different services. Nevertheless, it is possible to have metrics through which these services can be classified and categorized.

Privacy-enhancing technologies are often in direct opposition to quality of service. If a certain service is only able to provide accurate information by having access to individuals' personal data, enhancing the privacy of individuals might interfere with the quality of the service. This is especially relevant for LBSs, where increased resolution of location information is even more intrusive for the privacy of individuals. If researchers, creators and consumers are to navigate this trade-off between privacy and quality of service, they need to have access to privacy evaluation metrics. The following section describes various metrics to quantify privacy and, by extension, quality of service in LBSs.

\subsection{Location privacy and query privacy} % Tobias

\emph{Location privacy} is a term used to describe a user's private information which is directly related to location information. If a LBS provider knows the location of one user, it is able to infer additional information about the user. Location privacy is, therefore, concerned with two major questions \cite{shin_privacy_2012}:
\begin{itemize}
    \item Can a user be accurately located?
    \item Can additional information such as a user's habits or interests be inferred from the location data?
\end{itemize}

\emph{Query privacy}, on the other hand, concerns itself with private information a LBS provider is able to gain from observing a user's queries to its services. Queries in this context are requests a user issues to a LBS provider to get information about the user's surroundings. For example, a user can request all restaurants in a specific radius around their location. A slightly different query could be one where the user asks for the closest gas station. For query privacy we are interested in two different questions \cite{shin_privacy_2012}:
\begin{itemize}
    \item Can a user be identified?
    \item Can additional information such as a user's habits or interests be inferred from the query itself?
\end{itemize}

Although \emph{location privacy} and \emph{query privacy} are closely related, they are different. If a LBS provider can accurately pinpoint and track a user's location, the user's queries will eventually reveal their identity to the provider. In this scenario, compromising the user's location privacy allows the provider to also compromise the user's query privacy. Likewise, if a user can be accurately identified through their queries (query privacy is compromised), the provider will be able to pinpoint the user's location (location privacy is compromised). However, it is possible that one of the two is compromised while the other remains intact. A user can issue a query which could be coming from any other $k-1$ users in the same area. The provider knows that all $k$ users are in a large area covering their location, but the query is not attributable to only one user. It follows that the user's query privacy is protected while the location privacy is compromised \cite{shin_privacy_2012}. The concept of hiding a user's queries in a crowd of $k-1$ other users is central to many Privacy Enchancing Technologies (PETs).

\subsection{\boldmath\texorpdfstring{$k$}{k}-Anonymity} % Tobias

The previous example illustrates an important concept in maintaining privacy in LBSs: $k$-anonymity. It originates from database research where a release of information (e.g. part of a database table) is anonymised by aggregating all rows into $k$ groups \cite{sweeney_k-anonymity_2002}. The parameter $k$ is adjusted to obtain the required level of privacy. The higher $k$, the better protected the private information in the release is. Unfortunately, a higher $k$ often results in a noticeable decrease in data accuracy because uniquely identifying information has to be stripped from the release. This fact contributes to the privacy versus service trade-off inherent in most privacy-friendly LBSs.

In the context of LBSs, $k$-anonymity is used to aggregate location information among $k$ users. Instead of aggregating over rows in a database, queries from multiple users are aggregated and modified such that each query is the same as $k-1$ other users' queries. The LBS provider will receive the modified queries which only ask for results for a specific area or time frame. The link between a user's location information and their identity is severed because all $k$ users are in the specific area in the specified time frame, but the LBS cannot pinpoint an individual user's location. If the location information is anonymised using an area or a time frame, it is called \emph{spatial cloaking} or \emph{temporal cloaking} respectively.

\subsection{Application of \boldmath\texorpdfstring{$k$}{k}-Anonymity} % Tobias

\textcite{gruteser_anonymous_2003} use spatial cloaking and temporal cloaking to anonymise user queries sent to a LBS provider. Their algorithm runs on a central server which collects, modifies and forwards user queries to the LBS provider. The principle is that the spatial and temporal accuracy of location information can be decreased as needed to achieve a predefined level of anonymity. This level is expressed with the $k$ factor in a $k$-anonymity setting. The algorithm receives as input the required minimum level of anonymity $k_{min}$, a user's accurate location, coordinates of the area covered by the anonymizing server and the position of other users in the same area. It will divide the area around the user's location iteratively into quadrants until at least one quadrant contains less than $k_{min}$ users. The result is the area as divided by the second to last iteration, since that iteration still satisfies the $k_{min}$ constraint.

\begin{figure}
    \centering
    \includegraphics[width=0.75\textwidth]{img/gruteser-areas.png}
    \caption{Four different 2000m$\times$2000m areas in Denver, Colorado. The thickest lines represent expressways, the medium lines arterials and the thin lines collector streets. Areas 1 and 2 contain mostly expressways and areas 3 and 4 mostly collector streets \cite{gruteser_anonymous_2003}.}
    \label{fig:gruteser-areas}
\end{figure}

\textcite{gruteser_anonymous_2003} simulate their algorithm on the four areas shown in figure~\ref{fig:gruteser-areas}. They obtained traffic statistics for the city of Denver to construct a realistic scenario. The results of the algorithm are shown in figure~\ref{fig:gruteser-results}. For areas with relatively low traffic (areas 3 and 4), the temporal resolution has to be severely lowered to achieve a spatial resolution of \unit[15]{m} with a comparatively high anonymity level of $k=10$ (blue line). Conversely, the spatial resolution in highly frequented areas can be improved even more by introducing a time delay of around \unit[15]{s} (red line). These results highlight the trade-off between spatial resolution, temporal resolution and anonymity level. These three parameters are highly dependent on the context of the application. \textcite{gruteser_anonymous_2003} argue that a high spatial but low temporal resolution (as with the blue line) might be acceptable for road hazard detection, for example. It is not necessary to identify the road hazard within seconds, but it is important to do so with high spatial accuracy and a high level of anonymity.

There are multiple drawbacks to the algorithm proposed by \textcite{gruteser_anonymous_2003}. First, it relies on a central server to anonymise the queries from each user. This central server has to be trusted and might be a hindrance to adoption if the party operating the server(s) has a conflict of interest. Second, a spatial resolution of \unit[15]{m} in the best case is not enough for a wide range of applications. For precise location tracking in a city with mobile phones, if a user is shown to be on the other side of a street, it will impact quality of service significantly. Third, the algorithm has bad quantisation in certain cases. If the last iteration satisfies an anonymity level of $k=4$, which is just below the required level of $k=5$, but the second to last iteration gives a level of $k=10$, the algorithm is far from optimal.

\begin{figure}
    \centering
    \includegraphics[width=0.6\textwidth]{img/gruteser-results.png}
    \caption{A decrease in temporal resolution provides an increase in spatial resolution for a fixed anonymity level $k$ \cite{gruteser_anonymous_2003}.}
    \label{fig:gruteser-results}
\end{figure}

\section{Implementations of privacy-friendly LBSs} % Mark

In this section we will present some implementation of privacy-friendly LBSs that we have found interesting. The papers we have found describe Uniform Grid Caching \cite{ZHANG2018881},
% DKM and DPP approaches \cite{DKM, DPP}, PPCS \cite{PPCS},
Hiding in the Mobile Crowd \cite{mobilecrowd} and Privacy-preserving Electronic Toll Pricing system \cite{balasch_pretp_2010}.

\subsection{Uniform grid caching}

This implementation \cite{ZHANG2018881} utilizes a converter that transforms a user defined grid into a universal one, then utilises the order-preserving symmetric encryption, caching and $k$-anonymity in an attempt to preserve location privacy of the user. We will describe the protocol and explain the background concepts to the reader.

\subsubsection{Motivation}

Since similar or the same location-based queries repeat frequently over time, we can utilize caching. This significantly lowers the amount of queries to the LBS servers and increases the privacy of some user queries. However, the caching only works when there is a uniform system to determine when a certain query is cached and when a completely new query must be made to the LBS server. To achieve this, the paper proposes a uniform grid and caching (UGC) scheme which we will explain later.

The usage of anonymisers is not a novel idea presented by this article. Most literature, however, proposes a fully trusted third party, which can be troublesome---the user does not wish to be tracked, not by the LBS provider nor by other third parties, even if they are trusted. This paper presents the use of a semi-trusted third party, which never gets to know the exact location of the user and cannot by itself compromise their privacy.

\subsubsection{Grid structure conversion}

When querying on a specific region, the user must first define which area on the grid they are interested in. First, the query area is split into grid cells of equal size. To transform individually defined grids into universal ones, the user sends their grid to the converter entity, who then performs same query or expanded query area conversion. The resulting query area is the uniform grid determined by the lower-left and upper-right coordinate. In this area, each grid cell is defined by a unique identifier.

Besides the uniform grid structure, the converter also generates a group of keys used for encrypting the grid identifier, location of POIs, integrity verification and OPSE.

\subsubsection{Creating the query request to the anonymiser}

After the user has obtained a uniform grid structure and the keys, they specify the query range with the radius around their position. With this information, they can obtain the query spatial region in the uniform grid structure. Corresponding to this region is a number of grid cells and their identifiers. Each identifier is hashed and encrypted with the grid identifier key. The user transmits these encrypted identifiers to the anonymiser which tries to find them in the cache. If there is a hit, the anonymiser returns the POIs to the user and the query is finished. However, if there is not a hit, we need to form $k$-anonymity.

\subsubsection{Forming a cloaking region}

To achieve this, the user must locally find $k-1$ users with the same POI. Each user specifies their interest radius and together they set each person's query spatial region. Then the lower-left and upper-right coordinates of each region are encrypted using OPSE: by utilising two B-tree structures that contain all encrypted plaintext values of the X and Y axis coordinates. The left subtree of each node has smaller values than the node and the right subtree has bigger values than the node. Using this, the users can send the encrypted coordinates to the anonymiser, which can extract the minimum and maximum values out of the B-trees without knowing their actual (decrypted) values. With this knowledge, we have achieved $k$-anonymity and formed a cloaking region.

\subsubsection{Querying LBS}

The anonymiser now has the cloaking region and after obtaining the encrypted query contents from the user, the anonymiser sends both to the LBS server. Having also received the keys from the converter, the LBS server can decrypt the cloaking region to obtain its actual coordinates and the query itself. With this information, the LBS queries its databases and obtains the POIs. For each POI, the LBS also calculates its grid cell identifier and computes its encrypted hash. The encrypted POIs and their encrypted and hashed grid identifiers are sent back to the anonymiser, where they are first cached and then returned to the user. Lastly, the user decrypts the received information and filters out the useful results.

%removed
%\subsection{DKM and DPP}%christina
%These two implementations\cite{DKM, DPP} attempt to prevent user trajectory tracking and query privacy. They both utilize dual-K mechanism and Dual Privacy Preserving schemes to achieve that. We will discuss both protocols and outline the differences.

%removed
%\subsection{PPCS}%Mark
%This paper \cite{PPCS} describes how to preserve privacy, when the LBS provider already knows how to differentiate the probability of the user being located in some area. We will describe the background theory and the protocol this paper proposes.

\subsection{Hiding in the mobile crowd: location privacy through collaboration}

\textcite{mobilecrowd} propose a crowd-sourcing solution to reduce the amount of tracking done by LBS providers. We will explain the motivation behind such an implementation and the protocol that this paper presents.

\subsubsection{Motivation}

MobiCrowd idea utilises the fact that nowadays most smartphones support the usage of LBS. With this in mind, there are many queries performed at nearby locations concurrently. So instead of querying the LBS servers each time, the users can first query locally if someone has sent the same query recently. By doing this, we not only preserve the query privacy of one query, but potentially prevent tracking of the user's movement trajectory. Since people are generally more likely to gather at populated areas, the effectiveness of this method is increased---the information can stay in the area for some time instead of being queried to the server each time.

%Provided that there are users willing to cooperate in order to increase their location and query privacy, this approach works increasingly well with the amount of users participating. \\

%There is also no implementation requirements on the LBS side, meaning the users can use this system without a specialised network or infrastructure.

\subsubsection{Centralised vs user-centric LBS solutions}

Centralised solutions are comprised of TTP solutions and those who require LBS to change its operation or infrastructure. The first usually utilize a TTP server as an anonymiser. But the problem that arises is why should we trust TTP, if we do not trust LBS. Another target for attacking arises as well. Even if we ignore these two issues, the anonymiser can become a tight bottleneck on bigger networks.

The latter is usually infeasible, since it is costly and mostly misaligned with the interests of LBS---we wish to reduce the amount of information the LBS providers receive and companies will not invest resources to help us achieve that goal.

On the other hand, user-centric solutions operate on user devices, usually by submitting inaccurate coordinates or obfuscating the position in other ways, but still fall prey to some of the attacks mentioned later in our paper.

The approach of the MobiCrowd scheme proposed in the paper we are describing is to minimise the amount of queries to LBS when the information is already available in the devices in the local network. This idea requires many users to participate, but with the idea of preserving their own privacy in mind, MobiCrowd could be very effective.

\subsubsection{Scheme}

On each user's device, a mobile transparent proxy maintains a buffer with LBS query responses, along with their signatures and expiry date. The devices that hold some location-specific information are called informed users for that area. The users interested in the information are termed seekers of that region. When a seeker is interested in obtaining some information about a specific area, they send a local broadcast query instead of querying the LBS. Should some informed user already have the information in their buffer, they may respond to the query through the local wireless ad hoc interface of the seeker device. Of course, this is only possible when the informed user is actually willing to take part in the exchange. If they agree, the local reply is carried out and the seeker becomes informed without disclosing any information to the LBS provider.

We have to keep in mind however, that should there be no informed users in the area, the seeker still has to query the LBS. In other words there is no crowd to hide in---again proving that this system works best in high informed user density areas.

\subsection{Privacy-preserving Electronic Toll Pricing System}

In this part of the paper, another privacy preserving technology is presented, as described on \cite{balasch_pretp_2010}. PrETP, a privacy preserving Electronic Toll Pricing System, relies on on-board units (OBUs) which can prove that they use genuine data and perform the correct operations while disclosing the minimum amount of location data.

\subsubsection{Main idea of the protocol}

Electronic Toll Pricing implementations are currently used in order to define the amount of money a driver should pay for using the roads. In order to do so, many implementations rely on OBUs which collect the location data of the corresponding cars. At the end of the tax year, the corresponding fee is calculated and sent to the service provider (SP). In order for the service provider to be convinced that the calculations performed during the fee computations are legitimate, the OBUs send along all the gathered location data to the service provider. This situation constitutes a threat to user privacy.

The system proposed by \cite{balasch_pretp_2010} tries to solve the predefined problem. PrETP still relies on OBUs while the main difference lies in the fact that even though the fee is computed locally, only a minimum amount of location data needs to be sent over to the service provider. In order to succeed at that, instead of the location data of the corresponding car, the cryptographic protocol of PrETP (Optimistic Protocol) sends along the final fee commitments to the locations and to the prices used during the fee computation. These commitments aim to reveal no further information about the location and the prices used, unless it is necessary.

\subsubsection{Security goals}

Even though the protection of user privacy is the main goal of the protocol, the interests of the Toll Charger (TC) and the Toll Service Provider (TSP) should also be guaranteed. Thus, some security goals have been set.

\begin{itemize}
    \item The system should be able to detect if a malicious driver has shut down his OBU in order to simulate that he drove less.
    \item The system should be able to detect if a malicious driver has tampered and altered the GPS signal in order to simulate a cheaper route.
    \item The system should be able to detect if a malicious driver has assigned arbitrary prices to the roads during the calculation of fee in order to reduce the cost.
    \item The system should be able to detect if a malicious driver has assigned an arbitrary final fee instead of the result of the correct calculations in the OBU.
\end{itemize}
It is important to state that the protocol's authors decided to focus on the detection of malicious activities instead of the prevention in order to reduce the production cost.

\subsubsection{Technical requirements}

In order to understand the Optimistic Payment Protocol we will need to define some technical concepts that will be used. Firstly, we will use \emph{signature schemes}. A signature scheme consists of three algorithms, namely the Signature Key Generation, the Signature Sign and the Signature Verify. After the creation of a key pair (public key, private key), one can sign a message to output a valid signature. The validity of the signature can be checked with the Signature Verify algorithm. We would like our signature scheme to be correct and unforgeable.

Moreover, we have the \emph{commitment schemes}. Most importantly, we would like to be able to produce commitments and open them when necessary. The commitment algorithm outputs a commitment $c_x$ on a message $x$, and auxiliary information $open_x$. We say that a commitment is opened when we reveal the pair $(x, open_x)$ and the Open algorithm outputs true. We would like our commitment scheme to have a hiding and a binding property. Additionally, the additively homomorphic property should be fulfilled.

Lastly, we will use \emph{proof of knowledge}. During this protocol, the prover must prove to the verifier knowledge of a secret value without actually disclosing this value.

\subsubsection{Optimistic Payment Protocol}

The cryptographic protocol consists of three phases. In the first phase, and during each tax period, the OBU collects the location data of the corresponding car. In order to calculate the fee, it slices the trajectories in segments and stores the information into tuples, $(loc, time)$. This tuple is mapped to a price based on a function defined by the Toll Service Provider. For each tuple, the OBU calculates the price $p$, the hash $h$ of the tuple, the commitment to the price $c_p$, and a proof $\pi$ that the committed price was obtained using the TSP's function.

During the second phase, at the end of the tax period, the OBU should add all the subfees calculated throughout the year to obtain the final $fee$. Additionally, it adds all the openings $open_p$ in order to obtain the final opening, $open_{fee}$. The message $m$ that the OBU should send to the TSP consists of the pair $(fee, open_{fee})$ and all the payment tuples calculated on the first phase. The OBU, finally, signs the message and sends it to the TSP along with its signature. Upon receiving this information from the OBU, the TSP verifies the validity of the signature, adds the commitments of all the payment tuples using the homomorphic property to obtain $c'_{fee}$ and checks whether the tuple $(fee, open_{fee})$ is a valid opening for the calculated commitment $c'_{fee}$.

Lastly, on the third phase, the Toll Charger (TC) obtains a proof $\phi$ that a car was spotted in a certain location at a specific time. The toll charger will relay this proof to the TSP in order to investigate the validity of the calculations conducted by the OBU throughout this specific time frame. Upon receipt of the proof, the OBU searches for the payment tuple for which the information of the proof $\phi$ matches the $(loc, time)$ stored in the tuple. Once the payment tuple is found, the OBU will send the number of the tuple to the TSP together with the preimage of $h$ and the opening $(p, open_p)$. In order to check the validity of the calculations, the TSP will check whether $(p, open_p)$ is a valid opening of $c_p$ and whether $(loc, time)$ is indeed the preimage of $h$.

\subsubsection{Evaluation and efficiency}

While the Optimistic Payment Protocol succeeds at preserving the privacy of the driver with the use of commitments, practical issues arise and lead us to reevaluate the efficiency of the protocol. The main issue of the system arises when the OBU is not able to calculate a payment tuple for a route, for example due to the fact that the route consists of two roads with different costs. This can happen when the driver leaves a highway and enters a secondary road. Even though there are solutions for the described problem mentioned in \cite{balasch_pretp_2010}, the described protocol cannot be efficiently implemented, in our opinion. Nevertheless, this protocol is still a good example of how we can use cryptographic schemes to succeed at preserving user privacy while convincing the other party that our calculations were conducted legitimately.

\section{Location privacy attacks} %Sotiris

In this section, we first give a taxonomy of different attackers based on the knowledge they use to obtain user's private location information. Then, we categorise various attacks on location privacy as they were classified in the paper~\cite{wernke2014classification}.

\subsection{Attacker knowledge}%Sotiris

Attacker knowledge is categorized according to two information features: temporal information and context information. In accordance with the temporal feature, the attacker is distinguished according to the amount of user's spatial information that is accessible. on one occasion, the attacker is only aware of a single user position and on the other, the attacker has access to a collection of several position updates across time. In the context feature dimension, the attacker is identified by the existence of extra,
beyond spatiotemporal information, context knowledge of the user. A sophisticated attacker, for example, would have extra context information for a user provided by a social network accounts, public information, a map, demographic data, etc. In addition to the known spatiotemporal information, the attacker can utilize this extra information to collect and determine other points of interest. For instance, an advanced attacker can use data collected by social media accounts to determine the user's favourite places, everyday activities and even home address in order to disclose someone's current location.

\subsection{Location privacy attacks}%Sotiris

The classification that is proposed in~\cite{wernke2014classification} categorizes the attacks in single position attacks, context linking attacks, multiple position attacks, attacks combining context linking with multiple position attacks, and attacks based on compromising a Trusted Third Party (TTP).

\subsubsection{Single position attacks}%Sotiris

The single position attack is based on the attacker analysing a single query from a user to deduce further spatial or identity information concealed by the user. Single position attacks are divided into location homogeneity attacks and location distribution attacks.

A location homogeneity attack can be effectively utilized, against basic $k$-anonymity schemes. The attacker examines all $k$-cluster members' location and if their position is almost similar, each members almost precise spatial information can be disclosed. On the contrary, the location information is safeguarded if the cluster members are spread out across a broader region.

On the other hand, location distribution attacks are based on the fact that users are not always evenly dispersed in a cluster space and are equally effective against some $k$-anonymity schemes. Taking for example a $k$- anonymity mechanism on a $k$-cluster with members spread throughout a both densely and sparsely inhabited region. The identity of a protected user whose location is in a sparsely populated area and far away from the other users, is likely to be revealed because the cluster's obfuscation area must be extended to the more dense area in order to cover the desired amount of users for $k$-anonymity to work.

\subsubsection{Context linking attacks}

In a context linking attack, the attacker takes advantage of personal context knowledge as well as external background knowledge in conjunction with spatiotemporal information in order to reduce user privacy. This type of attack is partitioned into the personal context linking attack, the probability distribution attack, and map matching attack.

Individual's personal context information such as preferences or hobbies can be exploited to launch a personal context linking attack. Assume, for example, that a person has a passionate interest in movies and he likes to frequently visit cinema at premiere nights. An advanced attacker, acquiring this knowledge, can further reduce the obfuscated area to regions with a cinema, increasing his ability of discovering the user's location.

In probability distribution attacks, the attacker attempts to derive a probability distribution function of the user's location over the obfuscated region. If this distribution is not uniform, the attacker is able to discover places where the user is likely to be located.

By removing all unimportant places, map matching may be used to reduce the obfuscation area to more specific locations where a user's position can be easier located. For instance, with the use of a map, an attacker can eliminate uninhabited regions like lakes and forests from the obfuscated area, thereby limiting the extent of the effective examined region.

\subsubsection{Multiple position attacks}

In a multiple position attack, an attacker records and correlates various user's spatial queries in order to reduce user privacy and define a user's location. There are four different types of this attack, namely identity matching attack, multiple query attack, location tracking attack and maximum movement boundary attack.

First, in an identity matching attack, the attacker targets user's many pseudonyms, generated to protect his identity. By connecting and matching these pseudonyms to the same identity based on similar or correlated features , he has the capability to compromise the privacy of the modified pseudonyms, and further identify the user.

Second,  a multiple query attack involves the attacker analysing many queries and updates, giving him the opportunity to launch a shrink region attack, as well as a region intersection attack. The former one involves an attacker that keeps track of successive queries from a $k$-anonymity set. By correlating these queries, the attacker is able to derive the identity and location of the user that provided the original query. In a region intersection attack, the attacker utilizes many inaccurate spatial updates in order to compute their intersection. Following that, he further deduces the user's privacy-sensitive zones, resulting in an examining area the user is more likely to be situated in.

Next, in a location tracking attack, the attacker takes advantage of many known position updates. For instance, even if an obfuscation technique is implemented, an attacker can correlate subsequent pseudonyms by matching spatiotemporal information of successive updates in order to recreate the movement of a user who is randomly changing pseudonyms in every query.

Lastly, in a maximum movement boundary attack, the attacker determines the greatest movement boundary region where the user might have traveled between two subsequent spatial updates. In this way, the position of the initial update assists the attacker in increasing the precision of the update of the next query by excluding any unreachable area, resulting in a bounded maximum movement area for examination.

\subsubsection{Combination of multiple position and context linking attacks}

Instead of utilising one of the illustrated attacks, an attacker depending on his capabilities and assets, can combine them or use them in sequence in order to compromise the user's location. For instance, an attacker can first use personal context knowledge to reduce the examined region to an area that includes the user's frequent activities (e.g. cinema) and then launch a maximum movement boundary attack to precisely resolve the direction of movement toward the user's activity (e.g. specific cinema).

\subsubsection{Compromised TTP}

The compromise of a trusted third party (TTP) illustrates how an attacker may get direct access to users' data on a TTP and take advantage of it. An attacker then, can use, modify and sell these data. Although not so common, this attack is plausible and not insignificant.

\section{Conclusion}

%In this section we will discuss our findings and state our opinions on why some techniques are better than others and which use cases they cover, technically and societally. We will also discuss how practical the aforementioned PET are and how they could be improved.

We believe there are many uses of LBSs in the modern society. Each day, we rely on various navigation applications on our smartphones to guide us to a preferred destination, to find places of interest nearby or even share our location with our friends, so we can find each other easier. However most of LBS providers use the query data for much more than just providing us with their information. It is not clear if this is legitimate or not---their data is given for free, in exchange for query information.

To tackle this, many PETs were created. We discussed some in this paper, ranging from simple $k$-Anonymity implementations to meticulously thought out toll payment systems. We are concerned with the real-world implementation of these PETs as they mostly function in separate scenarios and are not built in a modular fashion. Many of them require an additional implementation on the LBS infrastructure, which is unlikely to happen due to a clash of interests---by implementing these PETs, the data LBS providers gather, analyse and potentially sell is less detailed and thus less valuable. This is why we do think, that a law requiring such techniques to be used by LBS providers is the only realistic scenario in which these PETs will be implemented.

There are also some concepts in which the implementation is entirely separated from LBS infrastructure, for example using TTPs or mobile proxies. We believe that such concepts are much more realistic in their implementation possibility and usability. It should still be noted, that this infrastructure can also be attacked and the trust of users can still be exploited elsewhere.

Based on our research, we predict that in the future, there will be an increase of studies and PET implementations, as well as an increase in public attention and awareness in the area.

\cleardoublepage
%\addcontentsline{toc}{chapter}{Literature}

%\printbibliography[heading=bibintoc,type=article,title={Articles}]

%\printbibliography[heading=bibintoc,type=inproceedings,title={Inproceedings}]

%\printbibliography[heading=bibintoc,type=incollection,title={Book chapters}]

\printbibliography[heading=bibintoc,title={Literature}]
\end{document}