Add test phase description for object detection model
This commit is contained in:
parent
14353c79f3
commit
840ebf07de
@ -159,6 +159,20 @@
|
|||||||
keywords = {Computer Science - Computer Vision and Pattern Recognition}
|
keywords = {Computer Science - Computer Vision and Pattern Recognition}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@misc{lin2015,
|
||||||
|
title = {Microsoft {{COCO}}: {{Common Objects}} in {{Context}}},
|
||||||
|
shorttitle = {Microsoft {{COCO}}},
|
||||||
|
author = {Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Bourdev, Lubomir and Girshick, Ross and Hays, James and Perona, Pietro and Ramanan, Deva and Zitnick, C. Lawrence and Dollár, Piotr},
|
||||||
|
date = {2015-02-20},
|
||||||
|
number = {arXiv:1405.0312},
|
||||||
|
eprint = {1405.0312},
|
||||||
|
eprinttype = {arxiv},
|
||||||
|
publisher = {{arXiv}},
|
||||||
|
doi = {10.48550/arXiv.1405.0312},
|
||||||
|
archiveprefix = {arXiv},
|
||||||
|
keywords = {Computer Science - Computer Vision and Pattern Recognition}
|
||||||
|
}
|
||||||
|
|
||||||
@article{lopez-garcia2022,
|
@article{lopez-garcia2022,
|
||||||
title = {Machine {{Learning-Based Processing}} of {{Multispectral}} and {{RGB UAV Imagery}} for the {{Multitemporal Monitoring}} of {{Vineyard Water Status}}},
|
title = {Machine {{Learning-Based Processing}} of {{Multispectral}} and {{RGB UAV Imagery}} for the {{Multitemporal Monitoring}} of {{Vineyard Water Status}}},
|
||||||
author = {López-García, Patricia and Intrigliolo, Diego and Moreno, Miguel A. and Martínez-Moreno, Alejandro and Ortega, José Fernando and Pérez-Álvarez, Eva Pilar and Ballesteros, Rocío},
|
author = {López-García, Patricia and Intrigliolo, Diego and Moreno, Miguel A. and Martínez-Moreno, Alejandro and Ortega, José Fernando and Pérez-Álvarez, Eva Pilar and Ballesteros, Rocío},
|
||||||
|
|||||||
Binary file not shown.
@ -28,7 +28,8 @@
|
|||||||
\newcommand{\thesistitle}{Flower State Classification for Watering System} % The title of the thesis. The English version should be used, if it exists.
|
\newcommand{\thesistitle}{Flower State Classification for Watering System} % The title of the thesis. The English version should be used, if it exists.
|
||||||
|
|
||||||
% Set PDF document properties
|
% Set PDF document properties
|
||||||
\hypersetup{
|
\hypersetup
|
||||||
|
{
|
||||||
pdfpagelayout = TwoPageRight, % How the document is shown in PDF viewers (optional).
|
pdfpagelayout = TwoPageRight, % How the document is shown in PDF viewers (optional).
|
||||||
linkbordercolor = {Melon}, % The color of the borders of boxes around crosslinks (optional).
|
linkbordercolor = {Melon}, % The color of the borders of boxes around crosslinks (optional).
|
||||||
pdfauthor = {\authorname}, % The author's name in the document properties (optional).
|
pdfauthor = {\authorname}, % The author's name in the document properties (optional).
|
||||||
@ -68,6 +69,10 @@
|
|||||||
\newacronym{xai}{XAI}{Explainable Artificial Intelligence}
|
\newacronym{xai}{XAI}{Explainable Artificial Intelligence}
|
||||||
\newacronym{lime}{LIME}{Local Interpretable Model Agnostic Explanation}
|
\newacronym{lime}{LIME}{Local Interpretable Model Agnostic Explanation}
|
||||||
\newacronym{grad-cam}{Grad-CAM}{Gradient-weighted Class Activation Mapping}
|
\newacronym{grad-cam}{Grad-CAM}{Gradient-weighted Class Activation Mapping}
|
||||||
|
\newacronym{oid}{OID}{Open Images Dataset}
|
||||||
|
\newacronym{ap}{AP}{Average Precision}
|
||||||
|
\newacronym{iou}{IOU}{Intersection over Union}
|
||||||
|
\newacronym{map}{mAP}{mean average precision}
|
||||||
|
|
||||||
\begin{document}
|
\begin{document}
|
||||||
|
|
||||||
@ -118,12 +123,30 @@ models' aggregate performance on the test set and discuss whether the
|
|||||||
initial goals set by the problem description have been met or not.
|
initial goals set by the problem description have been met or not.
|
||||||
|
|
||||||
\section{Object Detection}
|
\section{Object Detection}
|
||||||
\label{sec:eval-yolo}
|
\label{sec:yolo-eval}
|
||||||
|
|
||||||
The object detection model was trained for 300 epochs and the weights
|
The object detection model was pre-trained on the COCO~\cite{lin2015}
|
||||||
from the best-performing epoch were saved. The model's fitness for
|
dataset and fine-tuned with data from the \gls{oid}
|
||||||
each epoch is calculated as the weighted average of \textsf{mAP}@0.5
|
\cite{kuznetsova2020} in its sixth version. Since the full \gls{oid}
|
||||||
and \textsf{mAP}@0.5:0.95:
|
dataset contains considerably more classes and samples than would be
|
||||||
|
feasibly trainable on a small cluster of GPUs, only images from the
|
||||||
|
two classes \emph{Plant} and \emph{Houseplant} have been
|
||||||
|
downloaded. The samples from the Houseplant class are merged into the
|
||||||
|
Plant class because the distinction between the two is not necessary
|
||||||
|
for our model. Furthermore, the \gls{oid} contains not only bounding
|
||||||
|
box annotations for object detection tasks, but also instance
|
||||||
|
segmentations, classification labels and more. These are not needed
|
||||||
|
for our purposes and are omitted as well. In total, the dataset
|
||||||
|
consists of 91479 images with a roughly 85/5/10 split for training,
|
||||||
|
validation and testing, respectively.
|
||||||
|
|
||||||
|
\subsection{Training Phase}
|
||||||
|
\label{sec:yolo-training-phase}
|
||||||
|
|
||||||
|
The object detection model was trained for 300 epochs on 79204 images
|
||||||
|
with 284130 ground truth labels. The weights from the best-performing
|
||||||
|
epoch were saved. The model's fitness for each epoch is calculated as
|
||||||
|
the weighted average of \textsf{mAP}@0.5 and \textsf{mAP}@0.5:0.95:
|
||||||
|
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\label{eq:fitness}
|
\label{eq:fitness}
|
||||||
@ -145,7 +168,8 @@ until performance deteriorates due to overfitting.
|
|||||||
\centering
|
\centering
|
||||||
\includegraphics{graphics/model_fitness.pdf}
|
\includegraphics{graphics/model_fitness.pdf}
|
||||||
\caption[Model fitness per epoch.]{Model fitness for each epoch
|
\caption[Model fitness per epoch.]{Model fitness for each epoch
|
||||||
calculated as in equation~\ref{eq:fitness}.}
|
calculated as in equation~\ref{eq:fitness}. The vertical gray line
|
||||||
|
at 133 marks the epoch with the highest fitness.}
|
||||||
\label{fig:fitness}
|
\label{fig:fitness}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
@ -155,22 +179,25 @@ nor recall change materially during training. In fact, precision
|
|||||||
starts to decrease from the beginning, while recall experiences a
|
starts to decrease from the beginning, while recall experiences a
|
||||||
barely noticeable increase. Taken together with the box and object
|
barely noticeable increase. Taken together with the box and object
|
||||||
loss from figure~\ref{fig:box-obj-loss}, we speculate that the
|
loss from figure~\ref{fig:box-obj-loss}, we speculate that the
|
||||||
pre-trained model already generalizes well to plant detection. Any
|
pre-trained model already generalizes well to plant detection because
|
||||||
further training solely impacts the confidence of detection, but does
|
one of the categories in the COCO~\cite{lin2015} dataset is
|
||||||
not lead to higher detection rates. This conclusion is supported by
|
\emph{potted plant}. Any further training solely impacts the
|
||||||
the increasing \textsf{mAP}@0.5:0.95.
|
confidence of detection, but does not lead to higher detection
|
||||||
|
rates. This conclusion is supported by the increasing
|
||||||
|
\textsf{mAP}@0.5:0.95 until epoch 133.
|
||||||
|
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\centering
|
\centering
|
||||||
\includegraphics{graphics/precision_recall.pdf}
|
\includegraphics{graphics/precision_recall.pdf}
|
||||||
\caption{Overall precision and recall during training for each epoch.}
|
\caption{Overall precision and recall during training for each
|
||||||
|
epoch. The vertical gray line at 133 marks the epoch with the
|
||||||
|
highest fitness.}
|
||||||
\label{fig:prec-rec}
|
\label{fig:prec-rec}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
Further culprits for the flat precision and recall values may be found
|
Further culprits for the flat precision and recall values may be found
|
||||||
in bad ground truth data. The labels from the Open Images
|
in bad ground truth data. The labels from the \gls{oid} are sometimes not
|
||||||
Dataset~\cite{kuznetsova2020} are sometimes not fine-grained
|
fine-grained enough. Images which contain multiple individual—often
|
||||||
enough. Images which contain multiple individual—often
|
|
||||||
overlapping—plants are labeled with one large bounding box instead of
|
overlapping—plants are labeled with one large bounding box instead of
|
||||||
multiple smaller ones. The model recognizes the individual plants and
|
multiple smaller ones. The model recognizes the individual plants and
|
||||||
returns tighter bounding boxes even if that is not what is specified
|
returns tighter bounding boxes even if that is not what is specified
|
||||||
@ -182,30 +209,78 @@ in a later stage. Smaller bounding boxes help the classifier to only
|
|||||||
focus on one plant at a time and to not get distracted by multiple
|
focus on one plant at a time and to not get distracted by multiple
|
||||||
plants in potentially different stages of wilting.
|
plants in potentially different stages of wilting.
|
||||||
|
|
||||||
The box loss
|
The box loss decreases slightly during training which indicates that
|
||||||
decreases slightly during training which indicates that the bounding
|
the bounding boxes become tighter around objects of interest. With
|
||||||
boxes become tighter around objects of interest. With increasing
|
increasing training time, however, the object loss increases,
|
||||||
training time, however, the object loss increases, indicating that
|
indicating that less and less plants are present in the predicted
|
||||||
less and less plants are present in the predicted bounding boxes. It
|
bounding boxes. It is likely that overfitting is a cause for the
|
||||||
is likely that overfitting is a cause for the increasing object loss
|
increasing object loss from epoch 40 onward. Since the best weights as
|
||||||
from epoch 40 onward. Since the best weights as measured by fitness
|
measured by fitness are found at epoch 133 and the object loss
|
||||||
are found at epoch 133 and the object loss accelerates from that
|
accelerates from that point, epoch 133 is probably the correct cutoff
|
||||||
point, epoch 133 is probably the right cutoff before overfitting
|
before overfitting occurs.
|
||||||
occurs.
|
|
||||||
|
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\centering
|
\centering
|
||||||
\includegraphics{graphics/val_box_obj_loss.pdf}
|
\includegraphics{graphics/val_box_obj_loss.pdf}
|
||||||
\caption[Box and object loss.]{Box and object
|
\caption[Box and object loss.]{Box and object loss measured against
|
||||||
loss{\protect\footnotemark} measured against the validation set.}
|
the validation set of 3091 images and 4092 ground truth
|
||||||
|
labels. The class loss is omitted because there is only one class
|
||||||
|
in the dataset and the loss is therefore always zero.}
|
||||||
\label{fig:box-obj-loss}
|
\label{fig:box-obj-loss}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
\footnotetext{The class loss is omitted because there is only one
|
\subsection{Test Phase}
|
||||||
class in the dataset and the loss is therefore always 0.}
|
\label{ssec:test-phase}
|
||||||
|
|
||||||
|
Of the 91479 images around 10\% were used for the test phase. These
|
||||||
|
images contain a total of 12238 ground truth
|
||||||
|
labels. Table~\ref{tab:yolo-metrics} shows precision, recall and the
|
||||||
|
harmonic mean of both (F1-score). The results indicate that the model
|
||||||
|
errs on the side of sensitivity because recall is higher than
|
||||||
|
precision. Although some detections are not labeled as plants in the
|
||||||
|
dataset, if there is a labeled plant in the ground truth data, the
|
||||||
|
chance is high that it will be detected. This behavior is in line with
|
||||||
|
how the model's detections are handled in practice. The detections are
|
||||||
|
drawn on the original image and the user is able to check the bounding
|
||||||
|
boxes visually. If there are wrong detections, the user can ignore
|
||||||
|
them and focus on the relevant ones instead. A higher recall will thus
|
||||||
|
serve the user's needs better than a high precision.
|
||||||
|
|
||||||
|
\begin{table}[h]
|
||||||
|
\centering
|
||||||
|
\begin{tabular}{lrrrr}
|
||||||
|
\toprule
|
||||||
|
{} & Precision & Recall & F1-score & Support \\
|
||||||
|
\midrule
|
||||||
|
Plant & 0.547571 & 0.737866 & 0.628633 & 12238.0 \\
|
||||||
|
\bottomrule
|
||||||
|
\end{tabular}
|
||||||
|
\caption{Precision, recall and F1-score for the object detection model.}
|
||||||
|
\label{tab:yolo-metrics}
|
||||||
|
\end{table}
|
||||||
|
|
||||||
|
Figure~\ref{fig:yolo-ap} shows the \gls{ap} for the \gls{iou}
|
||||||
|
thresholds of 0.5 and 0.95. Predicted bounding boxes with an \gls{iou}
|
||||||
|
of less than 0.5 are not taken into account for the precision and
|
||||||
|
recall values of table~\ref{tab:yolo-metrics}. COCO's \cite{lin2015}
|
||||||
|
main evaluation metric is the \gls{ap} averaged across the \gls{iou}
|
||||||
|
thresholds from 0.5 to 0.95 in 0.05 steps. This value is then averaged
|
||||||
|
across all classes and called \gls{map}. The object detection model
|
||||||
|
achieves a state-of-the-art \gls{map} of 0.5727 for the \emph{Plant}
|
||||||
|
class.
|
||||||
|
|
||||||
|
\begin{figure}[h]
|
||||||
|
\centering
|
||||||
|
\includegraphics{graphics/APpt5-pt95.pdf}
|
||||||
|
\caption[Object detection AP@0.5 and AP@0.95.]{Precision-recall
|
||||||
|
curves for \gls{iou} thresholds of 0.5 and 0.95. The \gls{ap} of a
|
||||||
|
specific threshold is defined as the area under the
|
||||||
|
precision-recall curve of that threshold. The \gls{map} across
|
||||||
|
\gls{iou} thresholds from 0.5 to 0.95 in 0.05 steps
|
||||||
|
\textsf{mAP}@0.5:0.95 is 0.5727.}
|
||||||
|
\label{fig:yolo-ap}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
\begin{center}
|
|
||||||
\end{center}
|
|
||||||
|
|
||||||
\backmatter
|
\backmatter
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user