Finish chapter 3
This commit is contained in:
parent
35acd07570
commit
7b0662b728
BIN
thesis/graphics/bottleneck/bottleneck.pdf
Normal file
BIN
thesis/graphics/bottleneck/bottleneck.pdf
Normal file
Binary file not shown.
42
thesis/graphics/bottleneck/bottleneck.tex
Normal file
42
thesis/graphics/bottleneck/bottleneck.tex
Normal file
@ -0,0 +1,42 @@
|
|||||||
|
\documentclass{standalone}
|
||||||
|
\usepackage{tikz}
|
||||||
|
|
||||||
|
\usetikzlibrary{graphs,quotes}
|
||||||
|
\tikzstyle{block} = [draw, rectangle]
|
||||||
|
\tikzstyle{sum} = [draw, fill=blue!20, circle, node distance=1cm]
|
||||||
|
\tikzstyle{input} = [coordinate]
|
||||||
|
\tikzstyle{output} = [coordinate]
|
||||||
|
\tikzstyle{pinstyle} = [pin edge={to-,thin,black}]
|
||||||
|
|
||||||
|
\begin{document}
|
||||||
|
% \tikz\graph[grow down=3em]
|
||||||
|
% {
|
||||||
|
% x [as=$\mathbf{x}$]
|
||||||
|
% ->[thick] wl1 [block,as=weight layer]
|
||||||
|
% ->[thick,"relu"] wl2 [block,as=weight layer]
|
||||||
|
% ->[thick] plus [as=$\bigoplus$]
|
||||||
|
% ->[thick,"relu"]
|
||||||
|
% empty [as=];
|
||||||
|
% x ->[thick,bend left=90,distance=5em,"$\mathbf{x}$"] plus;
|
||||||
|
% };
|
||||||
|
|
||||||
|
\begin{tikzpicture}
|
||||||
|
\node (start) at (0,0) {};
|
||||||
|
\node[draw] (wl1) at (0,-1) {$1 \times 1$, $64$};
|
||||||
|
\node[draw] (wl2) at (0,-2) {$3 \times 3$, $64$};
|
||||||
|
\node[draw] (wl3) at (0,-3) {$1 \times 1$, $256$};
|
||||||
|
\node (plus) at (0,-4) {$\bigoplus$};
|
||||||
|
\node (end) at (0,-4.75) {};
|
||||||
|
|
||||||
|
\draw[->,thick] (start) to node[near start,left] {$256$-d} (wl1);
|
||||||
|
\draw[->,thick] (wl1) to node[right] {relu} (wl2);
|
||||||
|
\draw[->,thick] (wl2) to node[right] {relu} (wl3);
|
||||||
|
\draw[->,thick] (wl3) to (plus);
|
||||||
|
\draw[->,thick] (plus) to node[right] {relu} (end);
|
||||||
|
\draw[->,thick] (0,-0.35) to[bend left=90,distance=5em] node[right,align=center] {identity} (plus);
|
||||||
|
\end{tikzpicture}
|
||||||
|
\end{document}
|
||||||
|
%%% Local Variables:
|
||||||
|
%%% mode: latex
|
||||||
|
%%% TeX-master: t
|
||||||
|
%%% End:
|
||||||
Binary file not shown.
@ -1971,16 +1971,10 @@ from them.
|
|||||||
\chapter{Prototype Design}
|
\chapter{Prototype Design}
|
||||||
\label{chap:design}
|
\label{chap:design}
|
||||||
|
|
||||||
\begin{enumerate}
|
The following sections establish the requirements as well as the
|
||||||
\item Closely examine the used models (YOLOv7 and ResNet) regarding
|
general design philosophy of the prototype. We will then go into
|
||||||
their structure as well as unique features. Additionally, list the
|
detail about the selected model architectures and data augmentations
|
||||||
augmentations which were done during training of the object
|
which are applied during training.
|
||||||
detector. Finally, elaborate on the process of hyperparameter
|
|
||||||
optimization (train/val structure, metrics, genetic evolution and
|
|
||||||
random search).
|
|
||||||
\end{enumerate}
|
|
||||||
|
|
||||||
Estimated 10 pages for this chapter.
|
|
||||||
|
|
||||||
\section{Requirements}
|
\section{Requirements}
|
||||||
\label{sec:requirements}
|
\label{sec:requirements}
|
||||||
@ -2350,7 +2344,7 @@ weight updates which can stop the learning process entirely.
|
|||||||
|
|
||||||
There are multiple potential solutions to the vanishing gradient
|
There are multiple potential solutions to the vanishing gradient
|
||||||
problem. Different weight initialization schemes
|
problem. Different weight initialization schemes
|
||||||
\cite{glorot2010,sussillo2015} as well as batch normalization layers
|
\cite{glorot2010,sussillo2015} as well as \gls{bn} layers
|
||||||
\cite{ioffe2015} can help mitigate the problem. The most effective
|
\cite{ioffe2015} can help mitigate the problem. The most effective
|
||||||
solution yet, however, was proposed as \emph{residual connections} by
|
solution yet, however, was proposed as \emph{residual connections} by
|
||||||
\textcite{he2016}. Instead of connecting each layer only to the
|
\textcite{he2016}. Instead of connecting each layer only to the
|
||||||
@ -2364,37 +2358,88 @@ figure~\ref{fig:residual-connection}).
|
|||||||
\includegraphics[width=0.35\textwidth]{graphics/residual-connection/res.pdf}
|
\includegraphics[width=0.35\textwidth]{graphics/residual-connection/res.pdf}
|
||||||
\caption[Residual connection]{Residual connections: information from
|
\caption[Residual connection]{Residual connections: information from
|
||||||
previous layers flows into subsequent layers before the activation
|
previous layers flows into subsequent layers before the activation
|
||||||
function is applied. The symbol $\bigoplus$ represents simple
|
function is applied. The shortcut connection provides a path for
|
||||||
element-wise addition. Figure redrawn from \textcite{he2016}.}
|
information to \emph{skip} multiple layers. These connections are
|
||||||
|
parameter-free because of the identity mapping. The symbol
|
||||||
|
$\bigoplus$ represents simple element-wise addition. Figure
|
||||||
|
redrawn from \textcite{he2016}.}
|
||||||
\label{fig:residual-connection}
|
\label{fig:residual-connection}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
|
\textcite{he2016} develop a new architecture called \emph{ResNet}
|
||||||
|
based on VGGNet (see section~\ref{sssec:theory-vggnet}) which includes
|
||||||
|
residual connections after every second convolutional layer. The
|
||||||
|
filter sizes in their approach are smaller than in VGGNet which
|
||||||
|
results in much fewer trainable parameters overall. Since residual
|
||||||
|
connections do not add additional parameters and are relatively easy
|
||||||
|
to add to existing network structures, the authors compare four
|
||||||
|
versions of their architecture: one with $18$ and the other with $34$
|
||||||
|
layers, each with (ResNet) and without (plain ResNet) residual
|
||||||
|
connections. Curiously, the $34$-layer \emph{plain} network performs
|
||||||
|
worse on ImageNet classification than the $18$-layer plain
|
||||||
|
network. Once residual connections are used, however, the $34$-layer
|
||||||
|
network outperforms the $18$-layer version by $2.85$ percentage points
|
||||||
|
on the top-1 error metric of ImageNet.
|
||||||
|
|
||||||
|
\begin{figure}
|
||||||
|
\centering
|
||||||
|
\includegraphics[width=0.3\textwidth]{graphics/bottleneck/bottleneck.pdf}
|
||||||
|
\caption[Bottleneck building block]{A bottleneck building block used
|
||||||
|
in the ResNet-50, ResNet-101 and ResNet-152 architectures. The one
|
||||||
|
by one convolutions serve as a reduction and then inflation of
|
||||||
|
dimensions. The dimension reduction results in lower input and
|
||||||
|
output dimensions for the three by three layer and thus improves
|
||||||
|
training time. Figure redrawn from \textcite{he2016} with our own
|
||||||
|
small changes.}
|
||||||
|
\label{fig:residual-connection}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
We use the ResNet-50 model developed by \textcite{he2016} pretrained
|
||||||
|
on ImageNet in our own work. The $50$-layer model uses
|
||||||
|
\emph{bottleneck building blocks} instead of the two three by three
|
||||||
|
convolutional layers which lie in-between the residual connections of
|
||||||
|
the smaller ResNet-18 and ResNet-34 models. We chose this model
|
||||||
|
because it provides a suitable trade off between model complexity and
|
||||||
|
inference time.
|
||||||
|
|
||||||
\subsection{Data Augmentation}
|
\subsection{Data Augmentation}
|
||||||
\label{sec:methods-augmentation}
|
\label{sec:methods-augmentation}
|
||||||
|
|
||||||
Go over the data augmentation methods which are used during training
|
Data augmentation is an essential part of every training process
|
||||||
for the object detector:
|
throughout machine learning. By \emph{perturbing} already existing
|
||||||
\begin{itemize}
|
data with transformations, model engineers achieve an artificial
|
||||||
\item HSV-hue
|
enlargement of the data set which allows the machine learning model to
|
||||||
\item HSV-saturation
|
learn more robust features. It can also reduce overfitting for smaller
|
||||||
\item HSV-value
|
data sets. In the object detection world, special augmentations such
|
||||||
\item translation
|
as \emph{mosaic} help with edge cases which might crop up during
|
||||||
\item scaling
|
inference. For example, by combining four or more images of the
|
||||||
\item inversion (left-right)
|
training set into one the model better learns to draw bounding boxes
|
||||||
\item mosaic
|
around objects which are cut off and at the edges of the individual
|
||||||
\end{itemize}
|
images. Since we use data augmentation extensively during the training
|
||||||
|
phases, we will list a small selection of them.
|
||||||
|
|
||||||
Estimated 1 page for this section.
|
\begin{description}
|
||||||
|
\item[HSV-hue] Randomly change the hue of the color channels.
|
||||||
|
\item[HSV-saturation] Randomly change the saturation of the color
|
||||||
|
channels.
|
||||||
|
\item[HSV-value] Randomly change the value of the color channels.
|
||||||
|
\item[Translation] Randomly \emph{translate}, that is, move the image
|
||||||
|
by a specified amount of pixels.
|
||||||
|
\item[Scaling] Randomly scale the image up and down by a factor.
|
||||||
|
\item[Rotation] Randomly rotate the image.
|
||||||
|
\item[Inversion] Randomly flip the image along the $x$ or the
|
||||||
|
$y$-axis.
|
||||||
|
\item[Mosaic] Combine multiple images into one in a mosaic
|
||||||
|
arrangement.
|
||||||
|
\item[Mixup] Create a linear combination of multiple images.
|
||||||
|
\end{description}
|
||||||
|
|
||||||
\subsection{Hyperparameter Optimization}
|
These augmentations can either be defined to happen with a fixed value
|
||||||
\label{sec:methods-hypopt}
|
and a specified probability or they can be applied to all images, but
|
||||||
|
the value is not fixed. For example, one can specify a range for the
|
||||||
Go into detail about the process used to optimize the detection and
|
degree of rotation and every image is rotated by a random value within
|
||||||
classification models, what the training set looks like and how a
|
that range. Or these two options are combined to rotate an image by a
|
||||||
best-performing model was selected on the basis of the metrics.
|
random value within a range with a specified probability.
|
||||||
|
|
||||||
Estimated 2 pages for this section.
|
|
||||||
|
|
||||||
\chapter{Prototype Implementation}
|
\chapter{Prototype Implementation}
|
||||||
\label{chap:implementation}
|
\label{chap:implementation}
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user