Add design section in prototype design

This commit is contained in:
Tobias Eidelpes 2023-11-23 17:11:27 +01:00
parent 1820d695f4
commit bfc9488602

View File

@ -130,6 +130,7 @@ Challenge}
\newacronym{se}{SE}{Squeeze-Excitation}
\newacronym{bn}{BN}{Batch Normalization}
\newacronym{uav}{UAV}{Unmanned Aerial Vehicle}
\newacronym{csi}{CSI}{Camera Serial Interface}
\begin{document}
@ -1962,11 +1963,6 @@ from them.
\label{chap:design}
\begin{enumerate}
\item Describe the architecture of the prototype (two-stage approach
and how it is implemented with an object detector and
classifier). How the individual stages are connected (object
detector generates cutouts which are passed to classifier). Periodic
image capture and inference on the Jetson Nano.
\item Closely examine the used models (YOLOv7 and ResNet) regarding
their structure as well as unique features. Additionally, list the
augmentations which were done during training of the object
@ -2013,10 +2009,6 @@ recall values of 70\%.
\section{Design}
\label{sec:design}
Reference methods section (~\ref{sec:methods}) to explain two-stage
structure of the approach. Reference the description of the processing
loop on the prototype in Figure~\ref{fig:setup}.
\begin{figure}
\centering
\includegraphics[width=0.8\textwidth]{graphics/setup.pdf}
@ -2034,7 +2026,59 @@ loop on the prototype in Figure~\ref{fig:setup}.
\label{fig:setup}
\end{figure}
Estimated 1 page for this section.
Figure~\ref{fig:setup} shows the overall processing loop which happens
on the device. The camera is directly attached to the Nvidia Jetson
Nano via a \gls{csi} cable. Since the cable is quite rigid, the camera
must be mounted on a small \emph{stand} such as a tripod. Images
coming in from the camera are then passed to the object detection
model running on the Nvidia Jetson Nano. The model detects all plants
in the image and returns the coordinates of a bounding box per
plant. These coordinates are used to \emph{cut out} each plant from
the original image. The cutout is then passed to the second model
running on the Nvidia Jetson Nano which determines if the plant is
water-stressed or not. The percentage values of the prediction are
mapped to a scale between one and ten, where ten indicates that the
plant is in a very dire state. This number is available via a
\gls{rest} endpoint with additional information such as current time
as well as how long it has been since the state has been better than
three. The endpoint publishes this information for every plant which
has been detected.
The water stress prediction itself consists of two stages. First,
plants are detected and, second, each individual plant is
classified. This two-stage approach lends itself well to a two-stage
model structure. Since the first stage is an object detection task, we
employ an object detection model and pass the individual plant images
to a second model---the classifier.
While most object detection models could be trained to determine the
difference between water-stressed and healthy, the reason for this
two-stage design lies in the availability of data. To our knowledge,
there are no sufficiently large enough data sets available which
contain labeling information for water-stressed and healthy. Instead,
most data sets only classify common objects such as plane, person,
car, bicycle, and so forth (e.g. \gls{coco} \cite{lin2015}). However,
the classes \emph{plant} and \emph{houseplant} are present in most
data sets and provide the basis for our object detection model. The
size of these data sets allows us to train the object detection model
with a large number of samples which would have been unfeasible to
label on our own. The classifier is then trained with a smaller data
set which only comprises individual plants and their associated
classification (\emph{stressed} or \emph{healthy}).
Both data sets (object detection and classification) only allow us to
train and validate each model separately. A third data set is needed
to evaluate the detection/classification pipeline as a whole. To this
end, we construct our own data set where all plants per image are
labeled with bounding boxes as well as the classes \emph{stressed} or
\emph{healthy}. This data set is small in comparison to the one with
which the object detection model is trained, but suffices because it
is only used for evaluation. Labeling each sample in the evaluation
data set manually is still a laborious task which is why each image is
\emph{preannotated} by the already existing object detection and
classification model. The task of labeling thus becomes a task of
manually correcting the annotations which have been generated by the
models.
\section{Selected Methods}
\label{sec:selected-methods}