\documentclass[runningheads]{llncs}

\usepackage{graphicx}
\usepackage[backend=biber,style=numeric]{biblatex}
\usepackage{hyperref}
\usepackage{amsmath}
\usepackage{csquotes}

\hypersetup{
    colorlinks=true,
    linkcolor=black,
    urlcolor=blue,
    citecolor=black
}

\addbibresource{trustworthy-ai.bib}

\begin{document}

\title{Trustworthy Artificial Intelligence}
\author{Tobias Eidelpes}
\authorrunning{T. Eidelpes}

\institute{Technische Universität Wien, Karlsplatz 13, 1040 Wien, Austria
\email{e1527193@student.tuwien.ac.at}}

\maketitle

\begin{abstract}
The abstract should briefly summarize the contents of the paper in
150--250 words.

\keywords{Artificial Intelligence, Trustworthiness, Social Computing}
\end{abstract}


\section{Introduction}
\label{sec:introduction}

The use of artificial intelligence (AI) in computing has seen an unprecedented
rise over the last few years. From humble beginnings as a tool to aid humans in
decision making to advanced use cases where human interaction is avoided as much
as possible, AI has transformed the way we live our lives today. The
transformative capabilities of AI are not just felt in the area of computer
science, but have bled into a diverse set of other disciplines such as biology,
chemistry, mathematics and economics. For the purposes of this work, AIs are
machines that can learn, take decision autonomously and interact with the
environment~\cite{russellArtificialIntelligenceModern2021}.

While the possibilities of AI are seemingly endless, the public is slowly but
steadily learning about its limitations. These limitations manifest themselves
in areas such as autonomous driving and medicine, for example. These are fields
where AI can have a direct—potentially life-changing—impact on people's lives. A
self-driving car operates on roads where accidents can happen at any time.
Decisions made by the car before, during and after the accident can result in
severe consequences for all participants. In medicine, AIs are increasingly used
to drive human decision-making. The more critical the proper use and functioning
of AI is, the more trust in its architecture and results is required. Trust,
however, is not easily defined, especially in relation to artificial
intelligence. 

This work will explore the following question: \emph{Can artificial intelligence
be trustworthy, and if so, how?} To be able to discuss this question, trust has
to be defined and dissected into its constituent components.
Chapter~\ref{sec:modeling-trust} analyzes trust and molds the gained insights
into a framework suitable for interactions between humans and artificial
intelligence. Chapter~\ref{sec:taxonomy} approaches trustworthiness in
artificial intelligence from a computing perspective. There are various ways to
make AIs more \emph{trustworthy} through the use of technical means. This
chapter seeks to discuss and summarize important methods and approaches.
Chapter~\ref{sec:social-computing} discusses combining humans and artificial
intelligence into one coherent system which is capable of achieving more than
either of its parts on their own.


\section{Trust}
\label{sec:modeling-trust}

In order to be able to define the requirements and goals of \emph{trustworthy
AI}, it is important to know what trust is and how we humans establish trust
with someone or something. This section therefore defines and explores different
forms of trust.

\subsection{Defining Trust}

Commonly, \emph{trusting someone} means to have confidence in another person's
ability to do certain things. This can mean that we trust someone to speak the
truth to us or that a person is competently doing the things that we
\emph{entrust} them to do. We trust the person delivering the mail that they do
so on time and without mail getting lost on the way to our doors. We trust
people knowledgeable in a certain field such as medicine to be able to advise us
when we need medical advice. Trusting in these contexts means to cede control
over a particular aspect of our lives to someone else. We do so in expectation
that the trustee does not violate our \emph{social agreement} by acting against
our interests. Often times we are not able to confirm that the trustee has
indeed done his/her job. Sometimes we will only find out later that what was
in fact done did not happen in line with our own interests. Trust is therefore
also always a function of time. Previously entrusted people can—depending on
their track record—either continue to be trusted or lose trust.

We do not only trust certain people to act on our behalf, we can also place
trust in things rather than people. Every technical device or gadget receives
our trust to some extent, because we expect it to do the things we expect it to
do. This relationship encompasses \emph{dumb} devices such as vacuum cleaners
and refrigerators, as well as seemingly \emph{intelligent} systems such as
algorithms performing medical diagnoses. Artificial intelligence systems belong
to the latter category when they are functioning well, but can easily slip into
the former in the case of a poorly trained machine learning algorithm that
simply classifies pictures of dogs and cats always as dogs, for example.

Scholars usually divide trust either into \emph{cognitive} or
\emph{non-cognitive} forms. While cognitive trust involves some sort of rational
and objective evaluation of the trustee's capabilities, non-cognitive trust
lacks such an evaluation. For instance, if a patient comes to a doctor with a
health problem which resides in the doctor's domain, the patient will place
trust in the doctor because of the doctor's experience, track record and
education. The patient thus consciously decides that he/she would rather trust
the doctor to solve the problem and not a friend who does not have any
expertise. Conversely, non-cognitive trust allows humans to place trust in
people they know well, without a need for rational justification, but just
because of their existing relationship.

Due to the different dimensions of trust and its inherent complexity in
different contexts, frameworks for trust are an active field of research. One
such framework—proposed by \textcite{ferrarioAIWeTrust2020}—will be discussed in
the following sections.

\subsection{Incremental Model of Trust}

The framework by \textcite{ferrarioAIWeTrust2020} consists of three types of
trust: simple trust, reflective trust and paradigmatic trust. Their model thus
consists of the triple 

\[ T = \langle\text{simple trust}, \text{reflective trust}, \text{paradigmatic
    trust}\rangle \]

\noindent and a 5-tuple

\[ \langle X, Y, A, G, C\rangle \]

\noindent where $X$ and $Y$ denote interacting agents and $A$ the action to be
performed by the agent $Y$ to achieve goal $G$. $C$ stands for the context in
which the action takes place.

\subsubsection{Simple Trust} is a non-cognitive form of trust and the least
demanding form of trust in the incremental model. $X$ trusts $Y$ to perform an
action $A$ to pursue the goal $G$ without requiring additional information about
$Y$'s ability to generate a satisfactory outcome. In other words, $X$
\emph{depends} on $Y$ to perform an action. $X$ has no control over the process
and also does not want to control it or the outcome. A lot of day-to-day
interactions happen in some form or another under simple trust: we (simply)
trust a stranger on the street to show us the right way when we are lost.
Sometimes simple trust is unavoidable because of the trustor's inability to
obtain additional information about the other party. Children, for example, have
to simply trust adults not because they want to but out of necessity. This
changes when they get older and develop their ability to better judge other
people.

\subsubsection{Reflective Trust} adds an additional layer to the simple trust
model: trustworthiness. Trustworthiness can be defined as the cognitive belief
of $X$ that $Y$ is trustworthy. Reflective trust involves a cognitive process
which allows a trustor to obtain reasons for trusting a potential trustee. $X$
believes in the trustworthiness of $Y$ because there are reasons for $Y$ being
trustworthy. Contrary to simple trust, reflective trust includes the aspect of
control. For an agent $X$ to \emph{reflectively} trust another agent $Y$, $X$
has objective reasons to trust $Y$ but is not willing to do so without control.
Reflective trust does not have to be expressed in binary form but can also be
expressed by a subjective measure of confidence. The more likely a trustee $Y$
is to perform action $A$ towards a goal $G$, the higher $X$'s confidence in $Y$
is. Additionally, $X$ might have high reflective trust in $Y$ but still does not
trust $Y$ to perform a given task because of other, potentially unconscious,
reasons. 

\subsubsection{Pragmatic Trust} is the last form of trust in the incremental
model proposed by \cite{ferrarioAIWeTrust2020}. In addition to having objective
reasons to trust $Y$, $X$ is also willing to do so without control. It is thus a
combination of simple trust and reflective trust. Simple trust provides the
non-cognitive, non-controlling aspect of trust and reflective trust provides the
cognitive aspect.

\subsection{Application of the Model}

Since the incremental model of trust can be applied to human-human as well as
human-AI interactions, an example which draws from both domains will be
presented. The setting is that of a company which ships tailor-made machine
learning (ML) solutions to other firms. On the human-human interaction side
there are multiple teams working on different aspects of the software. The
hierarchical structure between bosses, their team leaders and their developers
is composed of different forms of trust. A boss has worked with a specific team
leader in the past and thus knows from experience that the team leader can be
trusted without control (paradigmatic trust). The team leader has had this
particular team for a number of projects already but has recently hired a new
junior developer. The team leader has some objective proof that the new hire is
capable of delivering good work on time due to impressive credentials but needs
more time to be able to trust the new colleague without control (reflective
trust).

On the human-AI side, one developer is working on a machine learning algorithm
to achieve a specific goal $G$. Taking the 5-tuple from the incremental model,
$X$ is the developer, $Y$ is the machine learning algorithm and $A$ is the
action the machine learning algorithm takes to achieve the goal $G$. In the
beginning, $X$ does not yet trust $Y$ to do its job properly. This is due to an
absence of any past performance metric of the algorithm to achieve $G$. While
most, if not all, parameters of $Y$ have to be controlled by $X$ in the
beginning, there is less and less control needed if $Y$ achieves $G$
consistently. This also increases the cognitive trust in $Y$ as time goes on due
to accurate performance metrics.

\section{Computational Aspects of Trustworthy AI}
\label{sec:taxonomy}

While there have been rapid advances in the quality of machine learning models
and neural networks, scholars, the public and policymakers are increasingly
recognizing the dangers of artificial intelligence. Concerns about privacy and
methods for deanonymizing individual data points to discrimination through
learned biases and environmental impacts have prompted a new area of research
which is focused on altering the models to alleviate these concerns. A recent
survey by \textcite{liuTrustworthyAIComputational2021} summarizes the state of
the art in trustworthy AI research. The authors collect research from a
computational perspective and divide it into six categories: safety and
robustness, non-discrimination and fairness, explainability, privacy,
accountability and auditability and environmental well-being. The following
sections summarize the computational methods for each category.

\subsection{Safety and Robustness}

Machine learning models should be able to give robust results even in the face
of adversarial attacks or naturally occurring noise in the training data. It has
been shown that even small perturbations in the training set can affect the
quality of the model disproportionately \cite{madryDeepLearningModels2019}. In
order to build models which retain their accuracy and general performance even
under less than ideal circumstances, it is necessary to study different forms of
attacks and how to defend against them. Safe and robust models lead to increases
in trustworthiness because beneficiaries can more easily depend on their
results (reflective trust).

\subsubsection{Threat models} model by which method an attacker manages to break
the performance of a particular machine learning algorithm. \emph{Poisoning
attacks} allow an attacker to intentionally introduce bad samples into the
training set which results in wrong predictions by the model. While many models
are trained beforehand, other models are constantly being updated by data that
the model receives from its beneficiaries. One such example may be Netflix'
movie recommendation system that receives which type of movies certain users are
interested in. A malicious user could therefore attack the recommendation engine
by supplying wrong inputs. \emph{Evasion attacks} consist of alterations which
are made to the training samples in such a way that these alternations—while
generally invisible to the human eye—mislead the algorithm.

\emph{White-box attacks} allow an attacker to clearly see all parameters and all
functions of a model. \emph{Black-box attackers}, on the other hand, can only
give inputs to the model and obtain the outputs. The former type of attack is
generally easier to carry out.

\emph{Targeted attacks} are aimed at specific classes of a machine learning
classifier for example. Suppose a model is trained to recognize facial features.
In a targeted attack, an attacker would try to feed inputs to the model such
that just one person is consistently incorrectly classified. This type of attack
is in contrast to \emph{non-targeted attacks} which seek to undermine the
model's performance in general. Targeted attacks are usually much harder to
detect as the predictions are correct overall but incorrect for a tiny subset.

\subsubsection{Defenses against adversarial attacks} are specific to the domain
a model is working in. \textcite{xuAdversarialAttacksDefenses2020} describe
different attacks and defenses for text, image and graph data in deep neural
networks. Defending against adversarial attacks often has negative impacts on
training time and accuracy \cite{tsiprasRobustnessMayBe2019}. Balancing these
trade-offs is therefore critical for real-world applications.

\subsection{Non-discrimination and Fairness}

Non-discrimination and fairness are two important properties of any artificial
intelligence system. If one or both of them are violated, trust in the system
erodes quickly. Often researchers only find out about a system's discriminatory
behavior when the system has been in place for a long time. In other cases—such
as with the chat bot Tay from Microsoft Research, for example—the problems
become immediately apparent once the algorithm is live. Countless other models
have been shown to be biased on multiple fronts: the US' recidivism prediction
software \textsc{COMPAS} is biased against black people
\cite{angwinMachineBias2016}, camera software for blink detection is biased
against Asian eyes \cite{roseFaceDetectionCamerasGlitches2010} and gender-based
discrimination in the placement of career advertisements
\cite{lambrechtAlgorithmicBiasEmpirical2019}. Some biases are already included
in the data from which an algorithm learns to differentiate between different
samples. Examples include \emph{measurement bias}, \emph{aggregation bias} and
\emph{representation bias} \cite{lambrechtAlgorithmicBiasEmpirical2019}. If
biases are present in systems that are already being used by people worldwide,
these systems can in turn influence users' behavior through \emph{algorithmic
bias}, \emph{popularity bias} and \emph{emergent bias}
\cite{friedmanBiasComputerSystems1996,lambrechtAlgorithmicBiasEmpirical2019}.

Not all biases are bad. In order for models to work properly, some form of bias
must be present in the data or there is no room for the model to generalize away
from individual samples to common properties. This is what is commonly referred
to as \emph{productive bias} \cite{liuTrustworthyAIComputational2021}. It is
often introduced by the assumptions engineers of machine learning algorithms
make about a specific problem. If the assumptions about the data are incorrectly
made by the model architects, productive bias quickly turns into \emph{erroneous
bias}. The last category of bias is \emph{discriminatory bias} and is of
particular relevance when designing artificial intelligence systems.

Fairness, on the other hand, is \enquote{…the absence of any prejudice or
favoritism towards an individual or a group based on their inherent or acquired
characteristics} \cite[p.~2]{mehrabiSurveyBiasFairness2021}. Fairness in the
context of artificial intelligence thus means that the system treats groups or
individuals with similar traits similarly.

\subsubsection{Bias assessment tools} allow researchers to quantify the amount
of bias and fairness produced by a machine learning algorithm. One such
assessment tool is Aequitas \cite{saleiroAequitasBiasFairness2019}. Another tool
developed by IBM is called the AI Fairness 360 toolkit
\cite{bellamyAIFairness3602018}.

\subsubsection{Bias mitigation techniques} deal with unwanted bias in artificial
intelligence systems. Depending on the stage at which they are introduced, they
can be either applied during \emph{pre-processing}, \emph{in-processing} or
\emph{post-processing} \cite{liuTrustworthyAIComputational2021}. If it is
possible to access the training data beforehand, pre-processing methods are
particularly effective. Undersampled classes can be purposely weighted
differently than majority classes to achieve a better distribution over all
samples. Re-weighting can also be applied during training of the algorithm by
first training on the samples and then re-training on the weights of the first
training iteration. Post-processing methods include transforming the trained
model after the fact to account for potentially biased outputs. Balancing these
transformations can be a difficult endeavor because prediction accuracy can
suffer.

\subsection{Explainability}

Recent advances in artificial intelligence can mostly be attributed to an
ever-increasing model complexity, made possible by massive deep neural networks
(DNNs) and other similarly complex architectures. Due to their size models are
treated as black-boxes with no apparent way to know how a particular prediction
came to be. This lack of explainability disallows humans to trust artificial
intelligence systems especially in critical areas such as medicine. To combat
the development towards difficult to understand artificial intelligence systems,
a new research field called \emph{eXplainable Artificial Intelligence} (XAI) has
emerged.

Scholars distinguish between two similar but slightly different terms:
\emph{explainability} and \emph{interpretability}. Interpretable systems allow
humans to \emph{look inside} the model to determine which predictions it is
going to make. This is only possible if most or all parameters of the model are
visible to an observer and changes to those parameters result in predictable
changes in outputs. Explainability, on the other hand, applies to black-box
systems such as deep neural networks where the system explains its predictions
after the fact.

The definition of interpretability already provides one possibility for
explainable models. If the model is constructed in a way which makes the
parameters visible and a decision can be traced from a starting point to the
outcome, the model is inherently explainable. Examples are decision trees,
linear regression models, rule-based models and Bayesian networks. This approach
is not possible for neural networks and thus \emph{model-agnostic explanations}
have to be found. \textsc{LIME} \cite{ribeiroWhyShouldTrust2016} is a tool to
find such model-agnostic explanations. \textsc{LIME} works \enquote{…by learning
an interpretable model locally around the prediction}
\cite[p.~1]{ribeiroWhyShouldTrust2016}. An advantage of this approach is that
\textsc{LIME} is useful for any model, regardless of how it is constructed. Due
to the high amount of flexibility introduced by model\nobreakdash-agnostic
explanations, these can even be used for already interpretable models such as
random forest classifiers.

Deep neural networks can also be explained using either a \emph{gradient-based}
or \emph{perturbation-based} explanation algorithm. Gradient-based algorithms
attempt to evaluate how much outputs change if inputs are modified. If the
gradient for a set of inputs is large, those inputs have a large effect on
outputs. Similarly, a small gradient indicates that the change in inputs does
not affect the outputs to a large extent. Perturbation-based explanations work
by finding perturbations in the inputs that alter the model's predictions the
most. \textsc{LIME} is an example of a perturbation-based explanation algorithm.

\subsection{Privacy}

\subsection{Accountability and Auditability}

\subsection{Environmental Well-Being}


\section{Social Computing}
\label{sec:social-computing}


\section{Conclusion}
\label{sec:conclusion}

\printbibliography

\end{document}