diff --git a/thesis/thesis.tex b/thesis/thesis.tex index e01bdf7..d8e960f 100644 --- a/thesis/thesis.tex +++ b/thesis/thesis.tex @@ -130,6 +130,7 @@ Challenge} \newacronym{se}{SE}{Squeeze-Excitation} \newacronym{bn}{BN}{Batch Normalization} \newacronym{uav}{UAV}{Unmanned Aerial Vehicle} +\newacronym{csi}{CSI}{Camera Serial Interface} \begin{document} @@ -1962,11 +1963,6 @@ from them. \label{chap:design} \begin{enumerate} -\item Describe the architecture of the prototype (two-stage approach - and how it is implemented with an object detector and - classifier). How the individual stages are connected (object - detector generates cutouts which are passed to classifier). Periodic - image capture and inference on the Jetson Nano. \item Closely examine the used models (YOLOv7 and ResNet) regarding their structure as well as unique features. Additionally, list the augmentations which were done during training of the object @@ -2013,10 +2009,6 @@ recall values of 70\%. \section{Design} \label{sec:design} -Reference methods section (~\ref{sec:methods}) to explain two-stage -structure of the approach. Reference the description of the processing -loop on the prototype in Figure~\ref{fig:setup}. - \begin{figure} \centering \includegraphics[width=0.8\textwidth]{graphics/setup.pdf} @@ -2034,7 +2026,59 @@ loop on the prototype in Figure~\ref{fig:setup}. \label{fig:setup} \end{figure} -Estimated 1 page for this section. +Figure~\ref{fig:setup} shows the overall processing loop which happens +on the device. The camera is directly attached to the Nvidia Jetson +Nano via a \gls{csi} cable. Since the cable is quite rigid, the camera +must be mounted on a small \emph{stand} such as a tripod. Images +coming in from the camera are then passed to the object detection +model running on the Nvidia Jetson Nano. The model detects all plants +in the image and returns the coordinates of a bounding box per +plant. These coordinates are used to \emph{cut out} each plant from +the original image. The cutout is then passed to the second model +running on the Nvidia Jetson Nano which determines if the plant is +water-stressed or not. The percentage values of the prediction are +mapped to a scale between one and ten, where ten indicates that the +plant is in a very dire state. This number is available via a +\gls{rest} endpoint with additional information such as current time +as well as how long it has been since the state has been better than +three. The endpoint publishes this information for every plant which +has been detected. + +The water stress prediction itself consists of two stages. First, +plants are detected and, second, each individual plant is +classified. This two-stage approach lends itself well to a two-stage +model structure. Since the first stage is an object detection task, we +employ an object detection model and pass the individual plant images +to a second model---the classifier. + +While most object detection models could be trained to determine the +difference between water-stressed and healthy, the reason for this +two-stage design lies in the availability of data. To our knowledge, +there are no sufficiently large enough data sets available which +contain labeling information for water-stressed and healthy. Instead, +most data sets only classify common objects such as plane, person, +car, bicycle, and so forth (e.g. \gls{coco} \cite{lin2015}). However, +the classes \emph{plant} and \emph{houseplant} are present in most +data sets and provide the basis for our object detection model. The +size of these data sets allows us to train the object detection model +with a large number of samples which would have been unfeasible to +label on our own. The classifier is then trained with a smaller data +set which only comprises individual plants and their associated +classification (\emph{stressed} or \emph{healthy}). + +Both data sets (object detection and classification) only allow us to +train and validate each model separately. A third data set is needed +to evaluate the detection/classification pipeline as a whole. To this +end, we construct our own data set where all plants per image are +labeled with bounding boxes as well as the classes \emph{stressed} or +\emph{healthy}. This data set is small in comparison to the one with +which the object detection model is trained, but suffices because it +is only used for evaluation. Labeling each sample in the evaluation +data set manually is still a laborious task which is why each image is +\emph{preannotated} by the already existing object detection and +classification model. The task of labeling thus becomes a task of +manually correcting the annotations which have been generated by the +models. \section{Selected Methods} \label{sec:selected-methods}