Finish YOLO selected methods section
This commit is contained in:
parent
f664ad2b40
commit
35acd07570
File diff suppressed because one or more lines are too long
Binary file not shown.
@ -132,6 +132,14 @@ Challenge}
|
||||
\newacronym{bn}{BN}{Batch Normalization}
|
||||
\newacronym{uav}{UAV}{Unmanned Aerial Vehicle}
|
||||
\newacronym{csi}{CSI}{Camera Serial Interface}
|
||||
\newacronym{nms}{NMS}{Non Maximum Suppression}
|
||||
\newacronym{sam}{SAM}{Spatial Attention Module}
|
||||
\newacronym{panet}{PANet}{Path Aggregation Network}
|
||||
\newacronym{ciou}{CIoU}{Complete Intersection over Union}
|
||||
\newacronym{siou}{SIoU}{Scylla Intersection over Union}
|
||||
\newacronym{giou}{GIoU}{Generalized Intersection over Union}
|
||||
\newacronym{elan}{ELAN}{Efficient Layer Aggregation Network}
|
||||
\newacronym{eelan}{E-ELAN}{Extended Efficient Layer Aggregation Network}
|
||||
|
||||
\begin{document}
|
||||
|
||||
@ -2084,20 +2092,18 @@ models.
|
||||
\section{Selected Methods}
|
||||
\label{sec:selected-methods}
|
||||
|
||||
Estimated 7 pages for this section.
|
||||
In the following sections we will go into detail about the two
|
||||
selected architectures for our prototype. The object detector we
|
||||
chose---\gls{yolo}v7---is part of a larger family of models which all
|
||||
function similarly, but have undergone substantial changes from
|
||||
version to version. In order to understand the used model, we trace
|
||||
the improvements to the \gls{yolo} family from version one to version
|
||||
seven. For the classification stage, we have opted for a ResNet
|
||||
architecture which is also described in detail.
|
||||
|
||||
\subsection{You Only Look Once}
|
||||
\label{sec:methods-detection}
|
||||
|
||||
Describe the inner workings of the YOLOv7 model structure and contrast
|
||||
it with previous versions as well as other object detectors. What has
|
||||
changed and how did these improvements manifest themselves? Reference
|
||||
the original paper~\cite{wang2022} and papers of previous versions of
|
||||
the same model (YOLOv5~\cite{jocher2022},
|
||||
YOLOv4~\cite{bochkovskiy2020}).
|
||||
|
||||
Estimated 2 pages for this section.
|
||||
|
||||
The \gls{yolo} family of object detection models started in 2015 when
|
||||
\cite{redmon2016} published the first version. Since then there have
|
||||
been up to 16 updated versions depending on how one counts. The
|
||||
@ -2205,16 +2211,130 @@ the \gls{voc} 2007 data set compared to 63.4\% of the previous
|
||||
at \qty{40}{fps} (\gls{map} 78.6\%) and up to \qty{91}{fps} (\gls{map}
|
||||
69\%).
|
||||
|
||||
\subsubsection{\gls{yolo}v3}
|
||||
\label{sssec:yolov3}
|
||||
|
||||
\gls{yolo}v3 \cite{redmon2018} provided additional updates to the
|
||||
\gls{yolo}v2 model. To be competitive with the deeper network
|
||||
structures of state-of-the-art models at the time, the authors
|
||||
introduce a deeper feature extractor called Darknet-53. It makes use
|
||||
of the residual connections popularized by ResNet \cite{he2016} (see
|
||||
section~\ref{sssec:theory-resnet}). Darknet-53 is more accurate than
|
||||
Darknet-19 and compares to ResNet-101, but can process more images per
|
||||
second (\qty{78}{fps} versus \qty{53}{fps}). The activation function
|
||||
throughout the network is still leaky \gls{relu}, as in earlier
|
||||
versions.
|
||||
|
||||
\gls{yolo}v3 uses multi-scale predictions to achieve better detection
|
||||
ratios across object sizes. Inspired by \glspl{fpn} (see
|
||||
section~\ref{sssec:theory-fpn}), \gls{yolo}v3 uses predictions at
|
||||
different scales from the feature extractor and combines them to form
|
||||
a final prediction. Combining the features from multiple scales is
|
||||
often done in the \emph{neck} of the object detection architecture.
|
||||
|
||||
Around the time of the publication of \gls{yolo}v3, researchers
|
||||
started to use the terminology \emph{backbone}, \emph{neck} and
|
||||
\emph{head} to describe the architecture of object detection
|
||||
models. The feature extractor (Darknet-53 in this case) is the
|
||||
\emph{backbone} and provides the feature maps which are aggregated in
|
||||
the \emph{neck} and passed to the \emph{head} which outputs the final
|
||||
predictions. In some cases there are additional postprocessing steps
|
||||
in the head such as \gls{nms} to eliminate duplicate or suboptimal
|
||||
detections.
|
||||
|
||||
While \gls{yolo}v2 had problems detecting small objects, \gls{yolo}v3
|
||||
performs much better on them (\gls{ap} of 18.3\% versus 5\% on
|
||||
\gls{coco}). The authors note, however, that the new model sometimes
|
||||
has comparatively worse results with larger objects. The reasons for
|
||||
this behavior are unknown. Additionally, \gls{yolo}v3 is still lagging
|
||||
behind other detectors when it comes to accurately localizing
|
||||
objects. The \gls{coco} evaluation metric was changed from the
|
||||
previous \gls{ap}$_{0.5}$ to the \gls{map} between $0.5$ to $0.95$
|
||||
which penalizes detectors which do not achieve close to perfect
|
||||
\gls{iou} scores. This change highlights \gls{yolo}v3's weakness in
|
||||
that area.
|
||||
|
||||
\subsubsection{\gls{yolo}v4}
|
||||
\label{sssec:yolov4}
|
||||
|
||||
Keeping in line with the aim of carefully balancing accuracy and speed
|
||||
of detection, \textcite{bochkovskiy2020} publish the fourth version of
|
||||
\gls{yolo}. The authors investigate the use of what they term
|
||||
\emph{bag of freebies}---methods which increase training time while
|
||||
increasing inference accuracy without sacrificing inference speed. A
|
||||
prominent example of such methods is data augmentation (see
|
||||
section~\ref{sec:methods-augmentation}). Specifically, the authors
|
||||
propose to use mosaic augmentation which lowers the need for large
|
||||
mini-batch sizes. They also use new features such as weighted residual
|
||||
connections \cite{shen2016}, a modified \gls{sam} \cite{woo2018}, a
|
||||
modified \gls{panet} \cite{liu2018} for the neck, \gls{ciou} loss
|
||||
\cite{zheng2020} for the detector and the Mish activation function
|
||||
\cite{misra2020}.
|
||||
|
||||
Taken together, these additional improvements yield a \gls{map} of
|
||||
43.5\% on the \gls{coco} test set while maintaining a speed of above
|
||||
\qty{30}{fps} on modern \glspl{gpu}. \gls{yolo}v4 was the first
|
||||
version which provided results on all scales (S, M, L) that were
|
||||
better than almost all other detectors at the time without sacrificing
|
||||
speed.
|
||||
|
||||
\subsubsection{\gls{yolo}v5}
|
||||
\label{sssec:yolov5}
|
||||
|
||||
The author of \gls{yolo}v5 \cite{jocher2020} ported the code from
|
||||
\gls{yolo}v4 from the Darknet framework to PyTorch which facilitated
|
||||
better interoperability with other Python utilities. New in this
|
||||
version is the pretraining algorithm called AutoAnchor which adjusts
|
||||
the anchor boxes based on the data set at hand. This version also
|
||||
implements a genetic algorithm for hyperparameter optimization (see
|
||||
section~\ref{ssec:hypopt-evo}) which is used in our work as well.
|
||||
|
||||
Version 5 comes in multiple architectures of various complexity. The
|
||||
smallest---and therefore fastest---version is called \gls{yolo}v5n where
|
||||
the \emph{n} stands for \emph{nano}. Additional versions with
|
||||
increasing parameters are \gls{yolo}v5s (small), \gls{yolo}v5m
|
||||
(medium), \gls{yolo}v5l (large), and \gls{yolo}v5x (extra large). The
|
||||
smaller models are intended to be used in resource constrained
|
||||
environments such as edge devices, but come with a cost in
|
||||
accuracy. Conversely, the larger models are for tasks where high
|
||||
accuracy is paramount and enough computational resources are
|
||||
available. The \gls{yolo}v5x model achieves a \gls{map} of 50.7\% on
|
||||
the \gls{coco} test data set.
|
||||
|
||||
\subsubsection{\gls{yolo}v6}
|
||||
\label{sssec:yolov6}
|
||||
|
||||
The authors of \gls{yolo}v6 \cite{li2022a} use a new backbone based on
|
||||
RepVGG \cite{ding2021} which they call EfficientRep. They also use
|
||||
different losses for classification (Varifocal loss \cite{zhang2021})
|
||||
and bounding box regression (\gls{siou}
|
||||
\cite{gevorgyan2022}/\gls{giou} \cite{rezatofighi2019}). \gls{yolo}v6
|
||||
is made available in eight scaled version of which the largest
|
||||
achieves a \gls{map} of 57.2\% on the \gls{coco} test set.
|
||||
|
||||
\subsubsection{\gls{yolo}v7}
|
||||
\label{sssec:yolov7}
|
||||
|
||||
At the time of implementation of our own plant detector, \gls{yolo}v7
|
||||
\cite{wang2022b} was the newest version within the \gls{yolo}
|
||||
family. Similarly to \gls{yolo}v4, it introduces more trainable bag of
|
||||
freebies which do not impact inference time. The improvements include
|
||||
the use of \glspl{eelan} (based on \glspl{elan} \cite{wang2022a}),
|
||||
joint depth and width model scaling techniques, reparameterization on
|
||||
module level, and an auxiliary head---similarly to GoogleNet (see
|
||||
section~\ref{sssec:theory-googlenet})---which assists during
|
||||
training. The model does not use a pretrained backbone, it is instead
|
||||
trained from scratch on the \gls{coco} data set. These changes result
|
||||
in much smaller model sizes compared to \gls{yolo}v4 and a \gls{map}
|
||||
of 56.8\% with a detection speed of over \qty{30}{fps}.
|
||||
|
||||
We use \gls{yolo}v7 in our own work during the plant detection stage
|
||||
because it was the fastest and most accurate object detector at the
|
||||
time of implementation.
|
||||
|
||||
\subsection{ResNet}
|
||||
\label{sec:methods-classification}
|
||||
|
||||
Introduce the approach of the \emph{ResNet} networks which implement
|
||||
residual connections to allow deeper layers. Describe the inner
|
||||
workings of the ResNet model structure. Reference the original
|
||||
paper~\cite{he2016}.
|
||||
|
||||
Estimated 2 pages for this section.
|
||||
|
||||
Early research \cite{bengio1994,glorot2010} already demonstrated that
|
||||
the vanishing/exploding gradient problem with standard gradient
|
||||
descent and random initialization adversely affects convergence during
|
||||
@ -3099,8 +3219,8 @@ Estimated 1 page for this section
|
||||
\listoftables % Starred version, i.e., \listoftables*, removes the toc entry.
|
||||
|
||||
% Use an optional list of algorithms.
|
||||
\listofalgorithms
|
||||
\addcontentsline{toc}{chapter}{List of Algorithms}
|
||||
% \listofalgorithms
|
||||
% \addcontentsline{toc}{chapter}{List of Algorithms}
|
||||
|
||||
% Add an index.
|
||||
\printindex
|
||||
@ -3117,18 +3237,4 @@ Estimated 1 page for this section
|
||||
%%% mode: latex
|
||||
%%% TeX-master: "thesis"
|
||||
%%% TeX-master: t
|
||||
%%% TeX-master: t
|
||||
%%% TeX-master: t
|
||||
%%% TeX-master: t
|
||||
%%% TeX-master: t
|
||||
%%% TeX-master: t
|
||||
%%% TeX-master: t
|
||||
%%% TeX-master: t
|
||||
%%% TeX-master: t
|
||||
%%% TeX-master: t
|
||||
%%% TeX-master: t
|
||||
%%% TeX-master: t
|
||||
%%% TeX-master: t
|
||||
%%% TeX-master: t
|
||||
%%% TeX-master: t
|
||||
%%% End:
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user