Finish chapter 3

2023-12-01 11:02:32 +01:00 · 2023-12-01 11:02:32 +01:00 · 7b0662b728
commit 7b0662b728
parent 35acd07570
4 changed files with 120 additions and 33 deletions
--- a/thesis/graphics/bottleneck/bottleneck.pdf
+++ b/thesis/graphics/bottleneck/bottleneck.pdf
--- a/thesis/graphics/bottleneck/bottleneck.tex
+++ b/thesis/graphics/bottleneck/bottleneck.tex
@ -0,0 +1,42 @@
 \documentclass{standalone}
 \usepackage{tikz}
 \usetikzlibrary{graphs,quotes}
 \tikzstyle{block} = [draw, rectangle]
 \tikzstyle{sum} = [draw, fill=blue!20, circle, node distance=1cm]
 \tikzstyle{input} = [coordinate]
 \tikzstyle{output} = [coordinate]
 \tikzstyle{pinstyle} = [pin edge={to-,thin,black}]
 \begin{document}
 % \tikz\graph[grow down=3em]
 % {
 %   x [as=$\mathbf{x}$]
 %   ->[thick] wl1 [block,as=weight layer]
 %   ->[thick,"relu"] wl2 [block,as=weight layer]
 %   ->[thick] plus [as=$\bigoplus$]
 %   ->[thick,"relu"]
 %   empty [as=];
 %   x ->[thick,bend left=90,distance=5em,"$\mathbf{x}$"] plus;
 % };
 \begin{tikzpicture}
  \node (start) at (0,0) {};
  \node[draw] (wl1) at (0,-1) {$1 \times 1$, $64$};
  \node[draw] (wl2) at (0,-2) {$3 \times 3$, $64$};
  \node[draw] (wl3) at (0,-3) {$1 \times 1$, $256$};
  \node (plus) at (0,-4) {$\bigoplus$};
  \node (end) at (0,-4.75) {};
  \draw[->,thick] (start) to node[near start,left] {$256$-d} (wl1); 
  \draw[->,thick] (wl1) to node[right] {relu} (wl2); 
  \draw[->,thick] (wl2) to node[right] {relu} (wl3); 
  \draw[->,thick] (wl3) to (plus); 
  \draw[->,thick] (plus) to node[right] {relu} (end); 
  \draw[->,thick] (0,-0.35) to[bend left=90,distance=5em] node[right,align=center] {identity} (plus); 
 \end{tikzpicture}
 \end{document}
 %%% Local Variables:
 %%% mode: latex
 %%% TeX-master: t
 %%% End:
--- a/thesis/thesis.pdf
+++ b/thesis/thesis.pdf
--- a/thesis/thesis.tex
+++ b/thesis/thesis.tex
@ -1971,16 +1971,10 @@ from them.
 \chapter{Prototype Design}
 \label{chap:design}
-\begin{enumerate}
+The following sections establish the requirements as well as the
-\item Closely examine the used models (YOLOv7 and ResNet) regarding
+general design philosophy of the prototype. We will then go into
-  their structure as well as unique features. Additionally, list the
+detail about the selected model architectures and data augmentations
-  augmentations which were done during training of the object
+which are applied during training.
  detector. Finally, elaborate on the process of hyperparameter
  optimization (train/val structure, metrics, genetic evolution and
  random search).
 \end{enumerate}
 Estimated 10 pages for this chapter.
 \section{Requirements}
 \label{sec:requirements}
@ -2350,7 +2344,7 @@ weight updates which can stop the learning process entirely.
 There are multiple potential solutions to the vanishing gradient
 problem. Different weight initialization schemes
-\cite{glorot2010,sussillo2015} as well as batch normalization layers
+\cite{glorot2010,sussillo2015} as well as \gls{bn} layers
 \cite{ioffe2015} can help mitigate the problem. The most effective
 solution yet, however, was proposed as \emph{residual connections} by
 \textcite{he2016}. Instead of connecting each layer only to the
@ -2364,37 +2358,88 @@ figure~\ref{fig:residual-connection}).
  \includegraphics[width=0.35\textwidth]{graphics/residual-connection/res.pdf}
  \caption[Residual connection]{Residual connections: information from
    previous layers flows into subsequent layers before the activation
-    function is applied. The symbol $\bigoplus$ represents simple
+    function is applied. The shortcut connection provides a path for
-    element-wise addition. Figure redrawn from \textcite{he2016}.}
+    information to \emph{skip} multiple layers. These connections are
    parameter-free because of the identity mapping. The symbol
    $\bigoplus$ represents simple element-wise addition. Figure
    redrawn from \textcite{he2016}.}
  \label{fig:residual-connection}
 \end{figure}
 \textcite{he2016} develop a new architecture called \emph{ResNet}
 based on VGGNet (see section~\ref{sssec:theory-vggnet}) which includes
 residual connections after every second convolutional layer. The
 filter sizes in their approach are smaller than in VGGNet which
 results in much fewer trainable parameters overall. Since residual
 connections do not add additional parameters and are relatively easy
 to add to existing network structures, the authors compare four
 versions of their architecture: one with $18$ and the other with $34$
 layers, each with (ResNet) and without (plain ResNet) residual
 connections. Curiously, the $34$-layer \emph{plain} network performs
 worse on ImageNet classification than the $18$-layer plain
 network. Once residual connections are used, however, the $34$-layer
 network outperforms the $18$-layer version by $2.85$ percentage points
 on the top-1 error metric of ImageNet.
 \begin{figure}
  \centering
  \includegraphics[width=0.3\textwidth]{graphics/bottleneck/bottleneck.pdf}
  \caption[Bottleneck building block]{A bottleneck building block used
    in the ResNet-50, ResNet-101 and ResNet-152 architectures. The one
    by one convolutions serve as a reduction and then inflation of
    dimensions. The dimension reduction results in lower input and
    output dimensions for the three by three layer and thus improves
    training time. Figure redrawn from \textcite{he2016} with our own
    small changes.}
  \label{fig:residual-connection}
 \end{figure}
 We use the ResNet-50 model developed by \textcite{he2016} pretrained
 on ImageNet in our own work. The $50$-layer model uses
 \emph{bottleneck building blocks} instead of the two three by three
 convolutional layers which lie in-between the residual connections of
 the smaller ResNet-18 and ResNet-34 models. We chose this model
 because it provides a suitable trade off between model complexity and
 inference time.
 \subsection{Data Augmentation}
 \label{sec:methods-augmentation}
-Go over the data augmentation methods which are used during training
+Data augmentation is an essential part of every training process
-for the object detector:
+throughout machine learning. By \emph{perturbing} already existing
-\begin{itemize}
+data with transformations, model engineers achieve an artificial
-\item HSV-hue
+enlargement of the data set which allows the machine learning model to
-\item HSV-saturation
+learn more robust features. It can also reduce overfitting for smaller
-\item HSV-value
+data sets. In the object detection world, special augmentations such
-\item translation
+as \emph{mosaic} help with edge cases which might crop up during
-\item scaling
+inference. For example, by combining four or more images of the
-\item inversion (left-right)
+training set into one the model better learns to draw bounding boxes
-\item mosaic
+around objects which are cut off and at the edges of the individual
-\end{itemize}
+images. Since we use data augmentation extensively during the training
 phases, we will list a small selection of them.
-Estimated 1 page for this section.
+\begin{description}
 \item[HSV-hue] Randomly change the hue of the color channels.
 \item[HSV-saturation] Randomly change the saturation of the color
  channels.
 \item[HSV-value] Randomly change the value of the color channels.
 \item[Translation] Randomly \emph{translate}, that is, move the image
  by a specified amount of pixels.
 \item[Scaling] Randomly scale the image up and down by a factor.
 \item[Rotation] Randomly rotate the image.
 \item[Inversion] Randomly flip the image along the $x$ or the
  $y$-axis.
 \item[Mosaic] Combine multiple images into one in a mosaic
  arrangement.
 \item[Mixup] Create a linear combination of multiple images.
 \end{description}
-\subsection{Hyperparameter Optimization}
+These augmentations can either be defined to happen with a fixed value
-\label{sec:methods-hypopt}
+and a specified probability or they can be applied to all images, but
-
+the value is not fixed. For example, one can specify a range for the
-Go into detail about the process used to optimize the detection and
+degree of rotation and every image is rotated by a random value within
-classification models, what the training set looks like and how a
+that range. Or these two options are combined to rotate an image by a
-best-performing model was selected on the basis of the metrics.
+random value within a range with a specified probability.
 Estimated 2 pages for this section.
 \chapter{Prototype Implementation}
 \label{chap:implementation}