Add Transfer Learning section

2023-11-12 20:20:38 +01:00 · 2023-11-12 20:20:38 +01:00 · 50d36011a5
commit 50d36011a5
parent 832e21ee41
1 changed files with 62 additions and 5 deletions
--- a/thesis/thesis.tex
+++ b/thesis/thesis.tex
@ -1613,12 +1613,69 @@ top-1 accuracy of 75.2\% and 67.4\% respectively.
 \section{Transfer Learning}
 \label{sec:background-transfer-learning}
-Give a definition of transfer learning and explain how it is
+Transfer learning refers to the application of a learning algorithm to
-done. Compare fine-tuning just the last layers vs. propagating changes
+a target domain by utilizing knowledge already learned from a
-through the whole network. What are advantages to transfer learning?
+different source domain \cite{zhuang2021}. The learned representations
-Are there any disadvantages?
+from the source domain are thus \emph{transferred} to solve a related
 problem in another domain. Transfer learning works because
 semantically meaningful information an algorithm has learned from a
 (large) data set is often meaningful in other contexts as well, even
 though the \emph{new problem} is not exactly the same problem for
 which the original model had been trained for. An analogy to
 day-to-day life as humans can be drawn with sports. Intuitively,
 skills learned during soccer such as ball control, improved endurance
 and strategic thinking are often also useful in other ball
 sports. Someone who is adept at certain kinds of sports will likely be
 able to pick up similar types much faster.
-Estimated 2 pages for this section.
+In mathematical terms, \textcite{pan2010} define transfer learning as:
 \begin{quote}{\cite[p.1347]{pan2010}}
  Given a source domain $\mathcal{D}_{S}$ and learning task
  $\mathcal{T}_{S}$, a target domain $\mathcal{D}_{T}$ and learning task
  $\mathcal{T}_{T}$, transfer learning aims to help improve the learning of the
  target predictive function $f_{T}(\cdot)$ in $\mathcal{D}_{T}$ using the knowledge
  in $\mathcal{D}_{S}$ and $\mathcal{T}_{S}$, where $\mathcal{D}_{S}\neq\mathcal{D}_{T}$, or $\mathcal{T}_{S}\neq\mathcal{T}_{T}$.
 \end{quote}
 In the machine learning world, collecting and labeling data for
 training a model is often time consuming, expensive and sometimes not
 possible. Deep learning based models especially require substantial
 amounts of data to be able to robustly classify images or solve other
 tasks. Semi-supervised or unsupervised (see
 section~\ref{sec:theory-ml}) learning approaches can partially
 mitigate this problem, but having accurate ground truth data is
 usually a requirement nonetheless. Through the publication of large
 labeled data sets such as via the \glspl{ilsvrc}, a basis for
 (pre-)training exists from which the model can be optimized for
 downstream tasks.
 Transfer learning is not a panacea, however. Care has to be taken to
 only use models which have been pretrained in a source domain which is
 similar to the target domain in terms of feature space. While this may
 seem to be an easy task, it is often not known in advance if transfer
 learning is the correct approach. Furthermore, choosing whether to
 only remove the fully-connected layers at the end of a pretrained
 model or to fine-tune all parameters introduces at least one
 additional hyperparameter. These decisions have to be made by
 comparing the source domain with the target domain, how much data in
 the target domain is available, how much computational resources are
 available and observing which layers are responsible for which
 features. Since earlier layers usually contain low-level and later
 layers high-level information, resetting the weights of the last few
 layers or replacing them with different ones entirely is also an
 option.
 To summarize, while transfer learning is an effective tool and is
 likely a major factor in the proliferation of deep learning based
 models, not all domains are suited for it. The additional decisions
 which have to be made as a result of using transfer learning can
 introduce more complexity than would otherwise be necessary for a
 particular problem. It does, however, allow researchers to get started
 quickly and to iterate faster because popular network architectures
 pretrained on Imagenet are integrated into the major machine learning
 frameworks. Transfer learning is used extensively in this work to
 train a classifier as well as an object detection model.
 \section{Hyperparameter Optimization}
 \label{sec:background-hypopt}