Finish chapter 3

This commit is contained in:
Tobias Eidelpes 2023-12-01 11:02:32 +01:00
parent 35acd07570
commit 7b0662b728
4 changed files with 120 additions and 33 deletions

Binary file not shown.

View File

@ -0,0 +1,42 @@
\documentclass{standalone}
\usepackage{tikz}
\usetikzlibrary{graphs,quotes}
\tikzstyle{block} = [draw, rectangle]
\tikzstyle{sum} = [draw, fill=blue!20, circle, node distance=1cm]
\tikzstyle{input} = [coordinate]
\tikzstyle{output} = [coordinate]
\tikzstyle{pinstyle} = [pin edge={to-,thin,black}]
\begin{document}
% \tikz\graph[grow down=3em]
% {
% x [as=$\mathbf{x}$]
% ->[thick] wl1 [block,as=weight layer]
% ->[thick,"relu"] wl2 [block,as=weight layer]
% ->[thick] plus [as=$\bigoplus$]
% ->[thick,"relu"]
% empty [as=];
% x ->[thick,bend left=90,distance=5em,"$\mathbf{x}$"] plus;
% };
\begin{tikzpicture}
\node (start) at (0,0) {};
\node[draw] (wl1) at (0,-1) {$1 \times 1$, $64$};
\node[draw] (wl2) at (0,-2) {$3 \times 3$, $64$};
\node[draw] (wl3) at (0,-3) {$1 \times 1$, $256$};
\node (plus) at (0,-4) {$\bigoplus$};
\node (end) at (0,-4.75) {};
\draw[->,thick] (start) to node[near start,left] {$256$-d} (wl1);
\draw[->,thick] (wl1) to node[right] {relu} (wl2);
\draw[->,thick] (wl2) to node[right] {relu} (wl3);
\draw[->,thick] (wl3) to (plus);
\draw[->,thick] (plus) to node[right] {relu} (end);
\draw[->,thick] (0,-0.35) to[bend left=90,distance=5em] node[right,align=center] {identity} (plus);
\end{tikzpicture}
\end{document}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% End:

Binary file not shown.

View File

@ -1971,16 +1971,10 @@ from them.
\chapter{Prototype Design}
\label{chap:design}
\begin{enumerate}
\item Closely examine the used models (YOLOv7 and ResNet) regarding
their structure as well as unique features. Additionally, list the
augmentations which were done during training of the object
detector. Finally, elaborate on the process of hyperparameter
optimization (train/val structure, metrics, genetic evolution and
random search).
\end{enumerate}
Estimated 10 pages for this chapter.
The following sections establish the requirements as well as the
general design philosophy of the prototype. We will then go into
detail about the selected model architectures and data augmentations
which are applied during training.
\section{Requirements}
\label{sec:requirements}
@ -2350,7 +2344,7 @@ weight updates which can stop the learning process entirely.
There are multiple potential solutions to the vanishing gradient
problem. Different weight initialization schemes
\cite{glorot2010,sussillo2015} as well as batch normalization layers
\cite{glorot2010,sussillo2015} as well as \gls{bn} layers
\cite{ioffe2015} can help mitigate the problem. The most effective
solution yet, however, was proposed as \emph{residual connections} by
\textcite{he2016}. Instead of connecting each layer only to the
@ -2364,37 +2358,88 @@ figure~\ref{fig:residual-connection}).
\includegraphics[width=0.35\textwidth]{graphics/residual-connection/res.pdf}
\caption[Residual connection]{Residual connections: information from
previous layers flows into subsequent layers before the activation
function is applied. The symbol $\bigoplus$ represents simple
element-wise addition. Figure redrawn from \textcite{he2016}.}
function is applied. The shortcut connection provides a path for
information to \emph{skip} multiple layers. These connections are
parameter-free because of the identity mapping. The symbol
$\bigoplus$ represents simple element-wise addition. Figure
redrawn from \textcite{he2016}.}
\label{fig:residual-connection}
\end{figure}
\textcite{he2016} develop a new architecture called \emph{ResNet}
based on VGGNet (see section~\ref{sssec:theory-vggnet}) which includes
residual connections after every second convolutional layer. The
filter sizes in their approach are smaller than in VGGNet which
results in much fewer trainable parameters overall. Since residual
connections do not add additional parameters and are relatively easy
to add to existing network structures, the authors compare four
versions of their architecture: one with $18$ and the other with $34$
layers, each with (ResNet) and without (plain ResNet) residual
connections. Curiously, the $34$-layer \emph{plain} network performs
worse on ImageNet classification than the $18$-layer plain
network. Once residual connections are used, however, the $34$-layer
network outperforms the $18$-layer version by $2.85$ percentage points
on the top-1 error metric of ImageNet.
\begin{figure}
\centering
\includegraphics[width=0.3\textwidth]{graphics/bottleneck/bottleneck.pdf}
\caption[Bottleneck building block]{A bottleneck building block used
in the ResNet-50, ResNet-101 and ResNet-152 architectures. The one
by one convolutions serve as a reduction and then inflation of
dimensions. The dimension reduction results in lower input and
output dimensions for the three by three layer and thus improves
training time. Figure redrawn from \textcite{he2016} with our own
small changes.}
\label{fig:residual-connection}
\end{figure}
We use the ResNet-50 model developed by \textcite{he2016} pretrained
on ImageNet in our own work. The $50$-layer model uses
\emph{bottleneck building blocks} instead of the two three by three
convolutional layers which lie in-between the residual connections of
the smaller ResNet-18 and ResNet-34 models. We chose this model
because it provides a suitable trade off between model complexity and
inference time.
\subsection{Data Augmentation}
\label{sec:methods-augmentation}
Go over the data augmentation methods which are used during training
for the object detector:
\begin{itemize}
\item HSV-hue
\item HSV-saturation
\item HSV-value
\item translation
\item scaling
\item inversion (left-right)
\item mosaic
\end{itemize}
Data augmentation is an essential part of every training process
throughout machine learning. By \emph{perturbing} already existing
data with transformations, model engineers achieve an artificial
enlargement of the data set which allows the machine learning model to
learn more robust features. It can also reduce overfitting for smaller
data sets. In the object detection world, special augmentations such
as \emph{mosaic} help with edge cases which might crop up during
inference. For example, by combining four or more images of the
training set into one the model better learns to draw bounding boxes
around objects which are cut off and at the edges of the individual
images. Since we use data augmentation extensively during the training
phases, we will list a small selection of them.
Estimated 1 page for this section.
\begin{description}
\item[HSV-hue] Randomly change the hue of the color channels.
\item[HSV-saturation] Randomly change the saturation of the color
channels.
\item[HSV-value] Randomly change the value of the color channels.
\item[Translation] Randomly \emph{translate}, that is, move the image
by a specified amount of pixels.
\item[Scaling] Randomly scale the image up and down by a factor.
\item[Rotation] Randomly rotate the image.
\item[Inversion] Randomly flip the image along the $x$ or the
$y$-axis.
\item[Mosaic] Combine multiple images into one in a mosaic
arrangement.
\item[Mixup] Create a linear combination of multiple images.
\end{description}
\subsection{Hyperparameter Optimization}
\label{sec:methods-hypopt}
Go into detail about the process used to optimize the detection and
classification models, what the training set looks like and how a
best-performing model was selected on the basis of the metrics.
Estimated 2 pages for this section.
These augmentations can either be defined to happen with a fixed value
and a specified probability or they can be applied to all images, but
the value is not fixed. For example, one can specify a range for the
degree of rotation and every image is rotated by a random value within
that range. Or these two options are combined to rotate an image by a
random value within a range with a specified probability.
\chapter{Prototype Implementation}
\label{chap:implementation}