291 lines
15 KiB
TeX
291 lines
15 KiB
TeX
\documentclass[runningheads]{llncs}
|
|
|
|
\usepackage{graphicx}
|
|
\usepackage[backend=biber,style=numeric]{biblatex}
|
|
\usepackage{hyperref}
|
|
\usepackage{amsmath}
|
|
|
|
\hypersetup{
|
|
colorlinks=true,
|
|
linkcolor=black,
|
|
urlcolor=blue,
|
|
citecolor=black
|
|
}
|
|
|
|
\addbibresource{trustworthy-ai.bib}
|
|
|
|
\begin{document}
|
|
|
|
\title{Trustworthy Artificial Intelligence}
|
|
\author{Tobias Eidelpes}
|
|
\authorrunning{T. Eidelpes}
|
|
|
|
\institute{Technische Universität Wien, Karlsplatz 13, 1040 Wien, Austria
|
|
\email{e1527193@student.tuwien.ac.at}}
|
|
|
|
\maketitle
|
|
|
|
\begin{abstract}
|
|
The abstract should briefly summarize the contents of the paper in
|
|
150--250 words.
|
|
|
|
\keywords{Artificial Intelligence, Trustworthiness, Social Computing}
|
|
\end{abstract}
|
|
|
|
|
|
\section{Introduction}
|
|
\label{sec:introduction}
|
|
|
|
The use of artificial intelligence (AI) in computing has seen an unprecedented
|
|
rise over the last few years. From humble beginnings as a tool to aid humans in
|
|
decision making to advanced use cases where human interaction is avoided as much
|
|
as possible, AI has transformed the way we live our lives today. The
|
|
transformative capabilities of AI are not just felt in the area of computer
|
|
science, but have bled into a diverse set of other disciplines such as biology,
|
|
chemistry, mathematics and economics. For the purposes of this work, AIs are
|
|
machines that can learn, take decision autonomously and interact with the
|
|
environment~\cite{russellArtificialIntelligenceModern2021}.
|
|
|
|
While the possibilities of AI are seemingly endless, the public is slowly but
|
|
steadily learning about its limitations. These limitations manifest themselves
|
|
in areas such as autonomous driving and medicine, for example. These are fields
|
|
where AI can have a direct—potentially life-changing—impact on people's lives. A
|
|
self-driving car operates on roads where accidents can happen at any time.
|
|
Decisions made by the car before, during and after the accident can result in
|
|
severe consequences for all participants. In medicine, AIs are increasingly used
|
|
to drive human decision-making. The more critical the proper use and functioning
|
|
of AI is, the more trust in its architecture and results is required. Trust,
|
|
however, is not easily defined, especially in relation to artificial
|
|
intelligence.
|
|
|
|
This work will explore the following question: \emph{Can artificial intelligence
|
|
be trustworthy, and if so, how?} To be able to discuss this question, trust has
|
|
to be defined and dissected into its constituent components.
|
|
Chapter~\ref{sec:modeling-trust} analyzes trust and molds the gained insights
|
|
into a framework suitable for interactions between humans and artificial
|
|
intelligence. Chapter~\ref{sec:taxonomy} approaches trustworthiness in
|
|
artificial intelligence from a computing perspective. There are various ways to
|
|
make AIs more \emph{trustworthy} through the use of technical means. This
|
|
chapter seeks to discuss and summarize important methods and approaches.
|
|
Chapter~\ref{sec:social-computing} discusses combining humans and artificial
|
|
intelligence into one coherent system which is capable of achieving more than
|
|
either of its parts on their own.
|
|
|
|
|
|
\section{Trust}
|
|
\label{sec:modeling-trust}
|
|
|
|
In order to be able to define the requirements and goals of \emph{trustworthy
|
|
AI}, it is important to know what trust is and how we humans establish trust
|
|
with someone or something. This section therefore defines and explores different
|
|
forms of trust.
|
|
|
|
\subsection{Defining Trust}
|
|
|
|
Commonly, \emph{trusting someone} means to have confidence in another person's
|
|
ability to do certain things. This can mean that we trust someone to speak the
|
|
truth to us or that a person is competently doing the things that we
|
|
\emph{entrust} them to do. We trust the person delivering the mail that they do
|
|
so on time and without mail getting lost on the way to our doors. We trust
|
|
people knowledgeable in a certain field such as medicine to be able to advise us
|
|
when we need medical advice. Trusting in these contexts means to cede control
|
|
over a particular aspect of our lives to someone else. We do so in expectation
|
|
that the trustee does not violate our \emph{social agreement} by acting against
|
|
our interests. Often times we are not able to confirm that the trustee has
|
|
indeed done his/her job. Sometimes we will only find out later that what was
|
|
in fact done did not happen in line with our own interests. Trust is therefore
|
|
also always a function of time. Previously entrusted people can—depending on
|
|
their track record—either continue to be trusted or lose trust.
|
|
|
|
We do not only trust certain people to act on our behalf, we can also place
|
|
trust in things rather than people. Every technical device or gadget receives
|
|
our trust to some extent, because we expect it to do the things we expect it to
|
|
do. This relationship encompasses \emph{dumb} devices such as vacuum cleaners
|
|
and refrigerators, as well as seemingly \emph{intelligent} systems such as
|
|
algorithms performing medical diagnoses. Artificial intelligence systems belong
|
|
to the latter category when they are functioning well, but can easily slip into
|
|
the former in the case of a poorly trained machine learning algorithm that
|
|
simply classifies pictures of dogs and cats always as dogs, for example.
|
|
|
|
Scholars usually divide trust either into \emph{cognitive} or
|
|
\emph{non-cognitive} forms. While cognitive trust involves some sort of rational
|
|
and objective evaluation of the trustee's capabilities, non-cognitive trust
|
|
lacks such an evaluation. For instance, if a patient comes to a doctor with a
|
|
health problem which resides in the doctor's domain, the patient will place
|
|
trust in the doctor because of the doctor's experience, track record and
|
|
education. The patient thus consciously decides that he/she would rather trust
|
|
the doctor to solve the problem and not a friend who does not have any
|
|
expertise. Conversely, non-cognitive trust allows humans to place trust in
|
|
people they know well, without a need for rational justification, but just
|
|
because of their existing relationship.
|
|
|
|
Due to the different dimensions of trust and its inherent complexity in
|
|
different contexts, frameworks for trust are an active field of research. One
|
|
such framework—proposed by \textcite{ferrarioAIWeTrust2020}—will be discussed in
|
|
the following sections.
|
|
|
|
\subsection{Incremental Model of Trust}
|
|
|
|
The framework by \textcite{ferrarioAIWeTrust2020} consists of three types of
|
|
trust: simple trust, reflective trust and paradigmatic trust. Their model thus
|
|
consists of the triple
|
|
|
|
\[ T = \langle\text{simple trust}, \text{reflective trust}, \text{paradigmatic
|
|
trust}\rangle \]
|
|
|
|
\noindent and a 5-tuple
|
|
|
|
\[ \langle X, Y, A, G, C\rangle \]
|
|
|
|
\noindent where $X$ and $Y$ denote interacting agents and $A$ the action to be
|
|
performed by the agent $Y$ to achieve goal $G$. $C$ stands for the context in
|
|
which the action takes place.
|
|
|
|
\subsubsection{Simple Trust} is a non-cognitive form of trust and the least
|
|
demanding form of trust in the incremental model. $X$ trusts $Y$ to perform an
|
|
action $A$ to pursue the goal $G$ without requiring additional information about
|
|
$Y$'s ability to generate a satisfactory outcome. In other words, $X$
|
|
\emph{depends} on $Y$ to perform an action. $X$ has no control over the process
|
|
and also does not want to control it or the outcome. A lot of day-to-day
|
|
interactions happen in some form or another under simple trust: we (simply)
|
|
trust a stranger on the street to show us the right way when we are lost.
|
|
Sometimes simple trust is unavoidable because of the trustor's inability to
|
|
obtain additional information about the other party. Children, for example, have
|
|
to simply trust adults not because they want to but out of necessity. This
|
|
changes when they get older and develop their ability to better judge other
|
|
people.
|
|
|
|
\subsubsection{Reflective Trust} adds an additional layer to the simple trust
|
|
model: trustworthiness. Trustworthiness can be defined as the cognitive belief
|
|
of $X$ that $Y$ is trustworthy. Reflective trust involves a cognitive process
|
|
which allows a trustor to obtain reasons for trusting a potential trustee. $X$
|
|
believes in the trustworthiness of $Y$ because there are reasons for $Y$ being
|
|
trustworthy. Contrary to simple trust, reflective trust includes the aspect of
|
|
control. For an agent $X$ to \emph{reflectively} trust another agent $Y$, $X$
|
|
has objective reasons to trust $Y$ but is not willing to do so without control.
|
|
Reflective trust does not have to be expressed in binary form but can also be
|
|
expressed by a subjective measure of confidence. The more likely a trustee $Y$
|
|
is to perform action $A$ towards a goal $G$, the higher $X$'s confidence in $Y$
|
|
is. Additionally, $X$ might have high reflective trust in $Y$ but still does not
|
|
trust $Y$ to perform a given task because of other, potentially unconscious,
|
|
reasons.
|
|
|
|
\subsubsection{Pragmatic Trust} is the last form of trust in the incremental
|
|
model proposed by \cite{ferrarioAIWeTrust2020}. In addition to having objective
|
|
reasons to trust $Y$, $X$ is also willing to do so without control. It is thus a
|
|
combination of simple trust and reflective trust. Simple trust provides the
|
|
non-cognitive, non-controlling aspect of trust and reflective trust provides the
|
|
cognitive aspect.
|
|
|
|
\subsection{Application of the Model}
|
|
|
|
Since the incremental model of trust can be applied to human-human as well as
|
|
human-AI interactions, an example which draws from both domains will be
|
|
presented. The setting is that of a company which ships tailor-made machine
|
|
learning (ML) solutions to other firms. On the human-human interaction side
|
|
there are multiple teams working on different aspects of the software. The
|
|
hierarchical structure between bosses, their team leaders and their developers
|
|
is composed of different forms of trust. A boss has worked with a specific team
|
|
leader in the past and thus knows from experience that the team leader can be
|
|
trusted without control (paradigmatic trust). The team leader has had this
|
|
particular team for a number of projects already but has recently hired a new
|
|
junior developer. The team leader has some objective proof that the new hire is
|
|
capable of delivering good work on time due to impressive credentials but needs
|
|
more time to be able to trust the new colleague without control (reflective
|
|
trust).
|
|
|
|
On the human-AI side, one developer is working on a machine learning algorithm
|
|
to achieve a specific goal $G$. Taking the 5-tuple from the incremental model,
|
|
$X$ is the developer, $Y$ is the machine learning algorithm and $A$ is the
|
|
action the machine learning algorithm takes to achieve the goal $G$. In the
|
|
beginning, $X$ does not yet trust $Y$ to do its job properly. This is due to an
|
|
absence of any past performance metric of the algorithm to achieve $G$. While
|
|
most, if not all, parameters of $Y$ have to be controlled by $X$ in the
|
|
beginning, there is less and less control needed if $Y$ achieves $G$
|
|
consistently. This also increases the cognitive trust in $Y$ as time goes on due
|
|
to accurate performance metrics.
|
|
|
|
\section{Computational Aspects of Trustworthy AI}
|
|
\label{sec:taxonomy}
|
|
|
|
While there have been rapid advances in the quality of machine learning models
|
|
and neural networks, scholars, the public and policymakers are increasingly
|
|
recognizing the dangers of artificial intelligence. Concerns about privacy and
|
|
methods for deanonymizing individual data points to discrimination through
|
|
learned biases and environmental impacts have prompted a new area of research
|
|
which is focused on altering the models to alleviate these concerns. A recent
|
|
survey by \textcite{liuTrustworthyAIComputational2021} summarizes the state of
|
|
the art in trustworthy AI research. The authors collect research from a
|
|
computational perspective and divide it into six categories: safety and
|
|
robustness, non-discrimination and fairness, explainability, privacy,
|
|
accountability and auditability and environmental well-being. The following
|
|
sections summarize the computational methods for each category.
|
|
|
|
\subsection{Safety and Robustness}
|
|
|
|
Machine learning models should be able to give robust results even in the face
|
|
of adversarial attacks or naturally occurring noise in the training data. It has
|
|
been shown that even small perturbations in the training set can affect the
|
|
quality of the model disproportionately \cite{madryDeepLearningModels2019}. In
|
|
order to build models which retain their accuracy and general performance even
|
|
under less than ideal circumstances, it is necessary to study different forms of
|
|
attacks and how to defend against them. Safe and robust models lead to increases
|
|
in trustworthiness because beneficiaries can more easily depend on their
|
|
results (reflective trust).
|
|
|
|
\subsubsection{Threat models} model by which method an attacker manages to break
|
|
the performance of a particular machine learning algorithm. \emph{Poisoning
|
|
attacks} allow an attacker to intentionally introduce bad samples into the
|
|
training set which results in wrong predictions by the model. While many models
|
|
are trained beforehand, other models are constantly being updated by data that
|
|
the model receives from its beneficiaries. One such example may be Netflix'
|
|
movie recommendation system that receives which type of movies certain users are
|
|
interested in. A malicious user could therefore attack the recommendation engine
|
|
by supplying wrong inputs. \emph{Evasion attacks} consist of alterations which
|
|
are made to the training samples in such a way that these alternations—while
|
|
generally invisible to the human eye—mislead the algorithm.
|
|
|
|
\emph{White-box attacks} allow an attacker to clearly see all parameters and all
|
|
functions of a model. \emph{Black-box attackers}, on the other hand, can only
|
|
give inputs to the model and obtain the outputs. The former type of attack is
|
|
generally easier to carry out.
|
|
|
|
\emph{Targeted attacks} are aimed at specific classes of a machine learning
|
|
classifier for example. Suppose a model is trained to recognize facial features.
|
|
In a targeted attack, an attacker would try to feed inputs to the model such
|
|
that just one person is consistently incorrectly classified. This type of attack
|
|
is in contrast to \emph{non-targeted attacks} which seek to undermine the
|
|
model's performance in general. Targeted attacks are usually much harder to
|
|
detect as the predictions are correct overall but incorrect for a tiny subset.
|
|
|
|
\subsubsection{Defenses against adversarial attacks} are specific to the domain
|
|
a model is working in. \textcite{xuAdversarialAttacksDefenses2020} describe
|
|
different attacks and defenses for text, image and graph data in deep neural
|
|
networks. Defending against adversarial attacks often has negative impacts on
|
|
training time and accuracy \cite{tsiprasRobustnessMayBe2019}. Balancing these
|
|
trade-offs is therefore critical for real-world applications.
|
|
|
|
\subsection{Non-discrimination and Fairness}
|
|
|
|
|
|
|
|
\subsection{Explainability}
|
|
|
|
\subsection{Privacy}
|
|
|
|
\subsection{Accountability and Auditability}
|
|
|
|
\subsection{Environmental Well-Being}
|
|
|
|
|
|
\section{Social Computing}
|
|
\label{sec:social-computing}
|
|
|
|
|
|
\section{Conclusion}
|
|
\label{sec:conclusion}
|
|
|
|
\printbibliography
|
|
|
|
\end{document}
|