\documentclass[runningheads]{llncs} \usepackage{graphicx} \usepackage[backend=biber,style=numeric]{biblatex} \usepackage{hyperref} \usepackage{amsmath} \hypersetup{ colorlinks=true, linkcolor=black, urlcolor=blue, citecolor=black } \addbibresource{trustworthy-ai.bib} \begin{document} \title{Trustworthy Artificial Intelligence} \author{Tobias Eidelpes} \authorrunning{T. Eidelpes} \institute{Technische Universität Wien, Karlsplatz 13, 1040 Wien, Austria \email{e1527193@student.tuwien.ac.at}} \maketitle \begin{abstract} The abstract should briefly summarize the contents of the paper in 150--250 words. \keywords{Artificial Intelligence, Trustworthiness, Social Computing} \end{abstract} \section{Introduction} \label{sec:introduction} The use of artificial intelligence (AI) in computing has seen an unprecedented rise over the last few years. From humble beginnings as a tool to aid humans in decision making to advanced use cases where human interaction is avoided as much as possible, AI has transformed the way we live our lives today. The transformative capabilities of AI are not just felt in the area of computer science, but have bled into a diverse set of other disciplines such as biology, chemistry, mathematics and economics. For the purposes of this work, AIs are machines that can learn, take decision autonomously and interact with the environment~\cite{russell_artificial_2021}. While the possibilities of AI are seemingly endless, the public is slowly but steadily learning about its limitations. These limitations manifest themselves in areas such as autonomous driving and medicine, for example. These are fields where AI can have a direct—potentially life-changing—impact on people's lives. A self-driving car operates on roads where accidents can happen at any time. Decisions made by the car before, during and after the accident can result in severe consequences for all participants. In medicine, AIs are increasingly used to drive human decision-making. The more critical the proper use and functioning of AI is, the more trust in its architecture and results is required. Trust, however, is not easily defined, especially in relation to artificial intelligence. This work will explore the following question: \emph{Can artificial intelligence be trustworthy, and if so, how?} To be able to discuss this question, trust has to be defined and dissected into its constituent components. Chapter~\ref{sec:modeling-trust} analyzes trust and molds the gained insights into a framework suitable for interactions between humans and artificial intelligence. Chapter~\ref{sec:taxonomy} approaches trustworthiness in artificial intelligence from a computing perspective. There are various ways to make AIs more \emph{trustworthy} through the use of technical means. This chapter seeks to discuss and summarize important methods and approaches. Chapter~\ref{sec:social-computing} discusses combining humans and artificial intelligence into one coherent system which is capable of achieving more than either of its parts on their own. \section{Trust} \label{sec:modeling-trust} In order to be able to define the requirements and goals of \emph{trustworthy AI}, it is important to know what trust is and how we humans establish trust with someone or something. This section therefore defines and explores different forms of trust. \subsection{Defining Trust} Commonly, \emph{trusting someone} means to have confidence in another person's ability to do certain things. This can mean that we trust someone to speak the truth to us or that a person is competently doing the things that we \emph{entrust} them to do. We trust the person delivering the mail that they do so on time and without mail getting lost on the way to our doors. We trust people knowledgeable in a certain field such as medicine to be able to advise us when we need medical advice. Trusting in these contexts means to cede control over a particular aspect of our lives to someone else. We do so in expectation that the trustee does not violate our \emph{social agreement} by acting against our interests. Often times we are not able to confirm that the trustee has indeed done his/her job. Sometimes we will only find out later that what was in fact done did not happen in line with our own interests. Trust is therefore also always a function of time. Previously entrusted people can—depending on their track record—either continue to be trusted or lose trust. We do not only trust certain people to act on our behalf, we can also place trust in things rather than people. Every technical device or gadget receives our trust to some extent, because we expect it to do the things we expect it to do. This relationship encompasses \emph{dumb} devices such as vacuum cleaners and refrigerators, as well as seemingly \emph{intelligent} systems such as algorithms performing medical diagnoses. Artificial intelligence systems belong to the latter category when they are functioning well, but can easily slip into the former in the case of a poorly trained machine learning algorithm that simply classifies pictures of dogs and cats always as dogs, for example. Scholars usually divide trust either into \emph{cognitive} or \emph{noncognitive} forms. While cognitive trust involves some sort of rational and objective evaluation of the trustee's capabilities, noncognitive trust lacks such an evaluation. For instance, if a patient comes to a doctor with a health problem which resides in the doctor's domain, the patient will place trust in the doctor because of the doctor's experience, track record and education. The patient thus consciously decides that he/she would rather trust the doctor to solve the problem and not a friend who does not have any expertise. Conversely, noncognitive trust allows humans to place trust in people they know well, without a need for rational justification, but just because of their existing relationship. Due to the different dimensions of trust and its inherent complexity in different contexts, frameworks for trust are an active field of research. One such framework—proposed by \textcite{ferrario_ai_2020}—will be discussed in the following sections. \subsection{Incremental Model of Trust} The framework by \textcite{ferrario_ai_2020} consists of three types of trust: simple trust, reflective trust and paradigmatic trust. Their model thus consists of the triple \[ T = \langle\text{simple trust}, \text{reflective trust}, \text{paradigmatic trust}\rangle \] \noindent and a 5-tuple \[ \langle X, Y, A, G, C\rangle \] \noindent where $X$ and $Y$ denote interacting agents and $A$ the action to be performed by the agent $Y$ to achieve goal $G$. $C$ stands for the context in which the action takes place. \subsubsection{Simple Trust} is a noncognitive form of trust and the least demanding form of trust in the incremental model. $X$ trusts $Y$ to perform an action $A$ to pursue the goal $G$ without requiring additional information about $Y$'s ability to generate a satisfactory outcome. In other words, $X$ \emph{depends} on $Y$ to perform an action. $X$ has no control over the process and also does not want to control it or the outcome. A lot of day-to-day interactions happen in some form or another under simple trust: we (simply) trust a stranger on the street to show us the right way when we are lost. Sometimes simple trust is unavoidable because of the trustor's inability to obtain additional information about the other party. Children, for example, have to simply trust adults not because they want to but out of necessity. This changes when they get older and develop their ability to better judge other people. \subsubsection{Reflective Trust} adds an additional layer to the simple trust model: trustworthiness. Trustworthiness can be defined as the cognitive belief of $X$ that $Y$ is trustworthy. Reflective trust involves a cognitive process which allows a trustor to obtain reasons for trusting a potential trustee. $X$ believes in the trustworthiness of $Y$ because there are reasons for $Y$ being trustworthy. Contrary to simple trust, reflective trust includes the aspect of control. For an agent $X$ to \emph{reflectively} trust another agent $Y$, $X$ has objective reasons to trust $Y$ but is not willing to do so without control. Reflective trust does not have to be expressed in binary form but can also be expressed by a subjective measure of confidence. The more likely a trustee $Y$ is to perform action $A$ towards a goal $G$, the higher $X$'s confidence in $Y$ is. Additionally, $X$ might have high reflective trust in $Y$ but still does not trust $Y$ to perform a given task because of other, potentially unconscious, reasons. \subsubsection{Pragmatic Trust} is the last form of trust in the incremental model proposed by \cite{ferrario_ai_2020}. In addition to having objective reasons to trust $Y$, $X$ is also willing to do so without control. It is thus a combination of simple trust and reflective trust. Simple trust provides the noncognitive, noncontrolling aspect of trust and reflective trust provides the cognitive aspect. \subsection{Application of the Model} Since the incremental model of trust can be applied to human-human as well as human-AI interactions, an example which draws from both domains will be presented. The setting is that of a company which ships tailor-made machine learning (ML) solutions to other firms. On the human-human interaction side there are multiple teams working on different aspects of the software. The hierarchical structure between bosses, their team leaders and their developers is composed of different forms of trust. A boss has worked with a specific team leader in the past and thus knows from experience that the team leader can be trusted without control (paradigmatic trust). The team leader has had this particular team for a number of projects already but has recently hired a new junior developer. The team leader has some objective proof that the new hire is capable of delivering good work on time due to impressive credentials but needs more time to be able to trust the new colleague without control (reflective trust). On the human-AI side, one developer is working on a machine learning algorithm to achieve a specific goal $G$. Taking the 5-tuple from the incremental model, $X$ is the developer, $Y$ is the machine learning algorithm and $A$ is the action the machine learning algorithm takes to achieve the goal $G$. In the beginning, $X$ does not yet trust $Y$ to do its job properly. This is due to an absence of any past performance metric of the algorithm to achieve $G$. While most, if not all, parameters of $Y$ have to be controlled by $X$ in the beginning, there is less and less control needed if $Y$ achieves $G$ consistently. This also increases the cognitive trust in $Y$ as time goes on due to accurate performance metrics. \section{Computational Aspects of Trustworthy AI} \label{sec:taxonomy} \section{Social Computing} \label{sec:social-computing} \section{Conclusion} \label{sec:conclusion} \printbibliography \end{document}