Finish chapter 3
This commit is contained in:
parent
35acd07570
commit
7b0662b728
BIN
thesis/graphics/bottleneck/bottleneck.pdf
Normal file
BIN
thesis/graphics/bottleneck/bottleneck.pdf
Normal file
Binary file not shown.
42
thesis/graphics/bottleneck/bottleneck.tex
Normal file
42
thesis/graphics/bottleneck/bottleneck.tex
Normal file
@ -0,0 +1,42 @@
|
||||
\documentclass{standalone}
|
||||
\usepackage{tikz}
|
||||
|
||||
\usetikzlibrary{graphs,quotes}
|
||||
\tikzstyle{block} = [draw, rectangle]
|
||||
\tikzstyle{sum} = [draw, fill=blue!20, circle, node distance=1cm]
|
||||
\tikzstyle{input} = [coordinate]
|
||||
\tikzstyle{output} = [coordinate]
|
||||
\tikzstyle{pinstyle} = [pin edge={to-,thin,black}]
|
||||
|
||||
\begin{document}
|
||||
% \tikz\graph[grow down=3em]
|
||||
% {
|
||||
% x [as=$\mathbf{x}$]
|
||||
% ->[thick] wl1 [block,as=weight layer]
|
||||
% ->[thick,"relu"] wl2 [block,as=weight layer]
|
||||
% ->[thick] plus [as=$\bigoplus$]
|
||||
% ->[thick,"relu"]
|
||||
% empty [as=];
|
||||
% x ->[thick,bend left=90,distance=5em,"$\mathbf{x}$"] plus;
|
||||
% };
|
||||
|
||||
\begin{tikzpicture}
|
||||
\node (start) at (0,0) {};
|
||||
\node[draw] (wl1) at (0,-1) {$1 \times 1$, $64$};
|
||||
\node[draw] (wl2) at (0,-2) {$3 \times 3$, $64$};
|
||||
\node[draw] (wl3) at (0,-3) {$1 \times 1$, $256$};
|
||||
\node (plus) at (0,-4) {$\bigoplus$};
|
||||
\node (end) at (0,-4.75) {};
|
||||
|
||||
\draw[->,thick] (start) to node[near start,left] {$256$-d} (wl1);
|
||||
\draw[->,thick] (wl1) to node[right] {relu} (wl2);
|
||||
\draw[->,thick] (wl2) to node[right] {relu} (wl3);
|
||||
\draw[->,thick] (wl3) to (plus);
|
||||
\draw[->,thick] (plus) to node[right] {relu} (end);
|
||||
\draw[->,thick] (0,-0.35) to[bend left=90,distance=5em] node[right,align=center] {identity} (plus);
|
||||
\end{tikzpicture}
|
||||
\end{document}
|
||||
%%% Local Variables:
|
||||
%%% mode: latex
|
||||
%%% TeX-master: t
|
||||
%%% End:
|
||||
Binary file not shown.
@ -1971,16 +1971,10 @@ from them.
|
||||
\chapter{Prototype Design}
|
||||
\label{chap:design}
|
||||
|
||||
\begin{enumerate}
|
||||
\item Closely examine the used models (YOLOv7 and ResNet) regarding
|
||||
their structure as well as unique features. Additionally, list the
|
||||
augmentations which were done during training of the object
|
||||
detector. Finally, elaborate on the process of hyperparameter
|
||||
optimization (train/val structure, metrics, genetic evolution and
|
||||
random search).
|
||||
\end{enumerate}
|
||||
|
||||
Estimated 10 pages for this chapter.
|
||||
The following sections establish the requirements as well as the
|
||||
general design philosophy of the prototype. We will then go into
|
||||
detail about the selected model architectures and data augmentations
|
||||
which are applied during training.
|
||||
|
||||
\section{Requirements}
|
||||
\label{sec:requirements}
|
||||
@ -2350,7 +2344,7 @@ weight updates which can stop the learning process entirely.
|
||||
|
||||
There are multiple potential solutions to the vanishing gradient
|
||||
problem. Different weight initialization schemes
|
||||
\cite{glorot2010,sussillo2015} as well as batch normalization layers
|
||||
\cite{glorot2010,sussillo2015} as well as \gls{bn} layers
|
||||
\cite{ioffe2015} can help mitigate the problem. The most effective
|
||||
solution yet, however, was proposed as \emph{residual connections} by
|
||||
\textcite{he2016}. Instead of connecting each layer only to the
|
||||
@ -2364,37 +2358,88 @@ figure~\ref{fig:residual-connection}).
|
||||
\includegraphics[width=0.35\textwidth]{graphics/residual-connection/res.pdf}
|
||||
\caption[Residual connection]{Residual connections: information from
|
||||
previous layers flows into subsequent layers before the activation
|
||||
function is applied. The symbol $\bigoplus$ represents simple
|
||||
element-wise addition. Figure redrawn from \textcite{he2016}.}
|
||||
function is applied. The shortcut connection provides a path for
|
||||
information to \emph{skip} multiple layers. These connections are
|
||||
parameter-free because of the identity mapping. The symbol
|
||||
$\bigoplus$ represents simple element-wise addition. Figure
|
||||
redrawn from \textcite{he2016}.}
|
||||
\label{fig:residual-connection}
|
||||
\end{figure}
|
||||
|
||||
\textcite{he2016} develop a new architecture called \emph{ResNet}
|
||||
based on VGGNet (see section~\ref{sssec:theory-vggnet}) which includes
|
||||
residual connections after every second convolutional layer. The
|
||||
filter sizes in their approach are smaller than in VGGNet which
|
||||
results in much fewer trainable parameters overall. Since residual
|
||||
connections do not add additional parameters and are relatively easy
|
||||
to add to existing network structures, the authors compare four
|
||||
versions of their architecture: one with $18$ and the other with $34$
|
||||
layers, each with (ResNet) and without (plain ResNet) residual
|
||||
connections. Curiously, the $34$-layer \emph{plain} network performs
|
||||
worse on ImageNet classification than the $18$-layer plain
|
||||
network. Once residual connections are used, however, the $34$-layer
|
||||
network outperforms the $18$-layer version by $2.85$ percentage points
|
||||
on the top-1 error metric of ImageNet.
|
||||
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=0.3\textwidth]{graphics/bottleneck/bottleneck.pdf}
|
||||
\caption[Bottleneck building block]{A bottleneck building block used
|
||||
in the ResNet-50, ResNet-101 and ResNet-152 architectures. The one
|
||||
by one convolutions serve as a reduction and then inflation of
|
||||
dimensions. The dimension reduction results in lower input and
|
||||
output dimensions for the three by three layer and thus improves
|
||||
training time. Figure redrawn from \textcite{he2016} with our own
|
||||
small changes.}
|
||||
\label{fig:residual-connection}
|
||||
\end{figure}
|
||||
|
||||
We use the ResNet-50 model developed by \textcite{he2016} pretrained
|
||||
on ImageNet in our own work. The $50$-layer model uses
|
||||
\emph{bottleneck building blocks} instead of the two three by three
|
||||
convolutional layers which lie in-between the residual connections of
|
||||
the smaller ResNet-18 and ResNet-34 models. We chose this model
|
||||
because it provides a suitable trade off between model complexity and
|
||||
inference time.
|
||||
|
||||
\subsection{Data Augmentation}
|
||||
\label{sec:methods-augmentation}
|
||||
|
||||
Go over the data augmentation methods which are used during training
|
||||
for the object detector:
|
||||
\begin{itemize}
|
||||
\item HSV-hue
|
||||
\item HSV-saturation
|
||||
\item HSV-value
|
||||
\item translation
|
||||
\item scaling
|
||||
\item inversion (left-right)
|
||||
\item mosaic
|
||||
\end{itemize}
|
||||
Data augmentation is an essential part of every training process
|
||||
throughout machine learning. By \emph{perturbing} already existing
|
||||
data with transformations, model engineers achieve an artificial
|
||||
enlargement of the data set which allows the machine learning model to
|
||||
learn more robust features. It can also reduce overfitting for smaller
|
||||
data sets. In the object detection world, special augmentations such
|
||||
as \emph{mosaic} help with edge cases which might crop up during
|
||||
inference. For example, by combining four or more images of the
|
||||
training set into one the model better learns to draw bounding boxes
|
||||
around objects which are cut off and at the edges of the individual
|
||||
images. Since we use data augmentation extensively during the training
|
||||
phases, we will list a small selection of them.
|
||||
|
||||
Estimated 1 page for this section.
|
||||
\begin{description}
|
||||
\item[HSV-hue] Randomly change the hue of the color channels.
|
||||
\item[HSV-saturation] Randomly change the saturation of the color
|
||||
channels.
|
||||
\item[HSV-value] Randomly change the value of the color channels.
|
||||
\item[Translation] Randomly \emph{translate}, that is, move the image
|
||||
by a specified amount of pixels.
|
||||
\item[Scaling] Randomly scale the image up and down by a factor.
|
||||
\item[Rotation] Randomly rotate the image.
|
||||
\item[Inversion] Randomly flip the image along the $x$ or the
|
||||
$y$-axis.
|
||||
\item[Mosaic] Combine multiple images into one in a mosaic
|
||||
arrangement.
|
||||
\item[Mixup] Create a linear combination of multiple images.
|
||||
\end{description}
|
||||
|
||||
\subsection{Hyperparameter Optimization}
|
||||
\label{sec:methods-hypopt}
|
||||
|
||||
Go into detail about the process used to optimize the detection and
|
||||
classification models, what the training set looks like and how a
|
||||
best-performing model was selected on the basis of the metrics.
|
||||
|
||||
Estimated 2 pages for this section.
|
||||
These augmentations can either be defined to happen with a fixed value
|
||||
and a specified probability or they can be applied to all images, but
|
||||
the value is not fixed. For example, one can specify a range for the
|
||||
degree of rotation and every image is rotated by a random value within
|
||||
that range. Or these two options are combined to rotate an image by a
|
||||
random value within a range with a specified probability.
|
||||
|
||||
\chapter{Prototype Implementation}
|
||||
\label{chap:implementation}
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user