Fix various consistency errors

This commit is contained in:
Tobias Eidelpes 2023-11-22 10:56:56 +01:00
parent bd56ced119
commit a3f0222a7f

View File

@ -183,7 +183,7 @@ learning.
Large-scale as well as small local farmers are able to survey their Large-scale as well as small local farmers are able to survey their
fields and gardens with drones or stationary cameras to determine soil fields and gardens with drones or stationary cameras to determine soil
and plant condition as well as when to water or and plant condition as well as when to water or
fertilize~\cite{ramos-giraldo2020}. Machine learning models play an fertilize \cite{ramos-giraldo2020}. Machine learning models play an
important role in that process because they allow automated important role in that process because they allow automated
decision-making in real time. While machine learning has been used in decision-making in real time. While machine learning has been used in
large-scale agriculture, it is also a valuable tool for household large-scale agriculture, it is also a valuable tool for household
@ -199,11 +199,11 @@ are numerous. First, gathering data in the field requires a network of
sensors which are linked to a central server for processing. Since sensors which are linked to a central server for processing. Since
communication between sensors is difficult without proper communication between sensors is difficult without proper
infrastructure, there is a high demand for processing the data on the infrastructure, there is a high demand for processing the data on the
sensor itself~\cite{mcenroe2022}. Second, differences in local soil, sensor itself \cite{mcenroe2022}. Second, differences in local soil,
plant and weather conditions require models to be optimized for these plant and weather conditions require models to be optimized for these
diverse inputs. Centrally trained models often lose the nuances diverse inputs. Centrally trained models often lose the nuances
present in the data because they have to provide actionable present in the data because they have to provide actionable
information for a larger area~\cite{awad2019}. Third, specialized information for a larger area \cite{awad2019}. Third, specialized
methods such as hyper- or multispectral imaging in the field provide methods such as hyper- or multispectral imaging in the field provide
fine-grained information about the object of interest but come with fine-grained information about the object of interest but come with
substantial upfront costs and are of limited interest for gardeners. substantial upfront costs and are of limited interest for gardeners.
@ -224,7 +224,7 @@ plants in the field of view and then to determine if the plants need
water or not. The model should be suitable for edge devices equipped water or not. The model should be suitable for edge devices equipped
with a \gls{tpu} or \gls{gpu} but with otherwise limited processing with a \gls{tpu} or \gls{gpu} but with otherwise limited processing
capabilities. Examples of such systems include Google's Coral capabilities. Examples of such systems include Google's Coral
development board and the Nvidia Jetson series of~\glspl{sbc}. The development board and the Nvidia Jetson series of \glspl{sbc}. The
model should make use of state-of-the-art algorithms from either model should make use of state-of-the-art algorithms from either
classical machine learning or deep learning. The literature review classical machine learning or deep learning. The literature review
will yield an appropriate machine learning method. Furthermore, the will yield an appropriate machine learning method. Furthermore, the
@ -325,19 +325,19 @@ further insights about the type of models which are commonly used.
In order to find and select appropriate datasets to train the models In order to find and select appropriate datasets to train the models
on, we will survey the existing big datasets for classes we can on, we will survey the existing big datasets for classes we can
use. Datasets such as the \gls{coco}~\cite{lin2015} and use. Datasets such as the \gls{coco} \cite{lin2015} and
\gls{voc}~\cite{everingham2010} contain the highly relevant class \gls{voc} \cite{everingham2010} contain the highly relevant class
\emph{Potted Plant}. By extracting only these classes from multiple \emph{Potted Plant}. By extracting only these classes from multiple
datasets and concatenating them together, it is possible to create one datasets and concatenating them together, it is possible to create one
unified dataset which only contains the classes necessary for training unified dataset which only contains the classes necessary for training
the model. the model.
The training of the models will happen in an environment where more The training of the models will happen in an environment where more
computational resources are available than what the~\gls{sbc} computational resources are available than what the \gls{sbc}
offers. We will deploy the final model with the~\gls{api} to offers. We will deploy the final model with the \gls{api} to the
the~\gls{sbc} after training and optimization. Furthermore, training \gls{sbc} after training and optimization. Furthermore, training will
will happen in tandem with a continuous evaluation process. After happen in tandem with a continuous evaluation process. After every
every iteration of the model, an evaluation run against the test set iteration of the model, an evaluation run against the test set
determines if there has been an improvement in performance. The determines if there has been an improvement in performance. The
results of the evaluation feed back into the parameter selection at results of the evaluation feed back into the parameter selection at
the beginning of each training phase. Small changes to the training the beginning of each training phase. Small changes to the training
@ -357,7 +357,7 @@ has been met, and—if not—give reasons for the rejection of all or part
of the hypotheses. of the hypotheses.
Overall, the development of our application follows an evolutionary Overall, the development of our application follows an evolutionary
pototyping process~\cite{davis1992,sears2007}. Instead of producing a prototyping process \cite{davis1992,sears2007}. Instead of producing a
full-fledged product from the start, development happens iteratively full-fledged product from the start, development happens iteratively
in phases. The main phases and their order for the prototype at hand in phases. The main phases and their order for the prototype at hand
are: model selection, implementation, and evaluation. The results of are: model selection, implementation, and evaluation. The results of
@ -404,7 +404,7 @@ results of the testing phases as well as the performance of the
aggregate model. Futhermore, the results are compared with the aggregate model. Futhermore, the results are compared with the
expectations and it is discussed whether they are explainable in the expectations and it is discussed whether they are explainable in the
context of the task at hand as well as benchmark results from other context of the task at hand as well as benchmark results from other
datasets (\gls{coco}~\cite{lin2015}). Chapter~\ref{chap:conclusion} datasets (\gls{coco} \cite{lin2015}). Chapter~\ref{chap:conclusion}
concludes the thesis with a summary and an outlook on possible concludes the thesis with a summary and an outlook on possible
improvements and further research questions. improvements and further research questions.
@ -685,8 +685,8 @@ network and is, therefore, not suitable for complex intra-data
relationships. A major downside to using the Heaviside step function relationships. A major downside to using the Heaviside step function
is that it is not differentiable at $x = 0$ and has a $0$ derivative is that it is not differentiable at $x = 0$ and has a $0$ derivative
elsewhere. These properties make it unsuitable for use with gradient elsewhere. These properties make it unsuitable for use with gradient
descent during back-propagation (section descent during backpropagation
\ref{ssec:theory-back-propagation}). (section~\ref{ssec:theory-backprop}).
\subsubsection{Sigmoid} \subsubsection{Sigmoid}
\label{sssec:theory-sigmoid} \label{sssec:theory-sigmoid}
@ -852,28 +852,28 @@ there is the case of binary random variables, i.e. only two classes to
classify exist, the measure is called binary classify exist, the measure is called binary
cross-entropy. Cross-entropy loss is known to outperform \gls{mse} for cross-entropy. Cross-entropy loss is known to outperform \gls{mse} for
classification tasks and allows the model to be trained classification tasks and allows the model to be trained
faster~\cite{simard2003}. faster \cite{simard2003}.
\subsection{Back-Propagation} \subsection{Backpropagation}
\label{ssec:theory-back-propagation} \label{ssec:theory-backprop}
So far, information only flows forward through the network whenever a So far, information only flows forward through the network whenever a
prediction for a particular input should be made. In order for a prediction for a particular input should be made. In order for a
neural network to learn, information about the computed loss has to neural network to learn, information about the computed loss has to
flow backward through the network. Only then can the weights at the flow backward through the network. Only then can the weights at the
individual neurons be updated. This type of information flow is termed individual neurons be updated. This type of information flow is termed
\emph{back-propagation} \cite{rumelhart1986}. Back-propagation \emph{backpropagation} \cite{rumelhart1986}. Backpropagation computes
computes the gradient of a loss function with respect to the weights the gradient of a loss function with respect to the weights of a
of a network for an input-output pair. The algorithm computes the network for an input-output pair. The algorithm computes the gradient
gradient iteratively starting from the last layer and works its way iteratively starting from the last layer and works its way backward
backward through the network until it reaches the first layer. through the network until it reaches the first layer.
Strictly speaking, back-propagation only computes the gradient, but Strictly speaking, backpropagation only computes the gradient, but
does not determine how the gradient is used to learn the new does not determine how the gradient is used to learn the new
weights. Once the back-propagation algorithm has computed the weights. Once the backpropagation algorithm has computed the gradient,
gradient, that gradient is passed to an algorithm which finds a local that gradient is passed to an algorithm which finds a local minimum of
minimum of it. This step is usually performed by some variant of it. This step is usually performed by some variant of gradient descent
gradient descent \cite{cauchy1847}. \cite{cauchy1847}.
\section{Object Detection} \section{Object Detection}
\label{sec:background-detection} \label{sec:background-detection}
@ -900,7 +900,7 @@ time.
\label{sssec:obj-viola-jones} \label{sssec:obj-viola-jones}
The first milestone was the face detector by The first milestone was the face detector by
~\textcite{viola2001,viola2001} which is able to perform face \textcite{viola2001,viola2001} which is able to perform face
recognition on $384$ by $288$ pixel (grayscale) images with recognition on $384$ by $288$ pixel (grayscale) images with
\qty{15}{fps} on a \qty{700}{\MHz} Intel Pentium III processor. The \qty{15}{fps} on a \qty{700}{\MHz} Intel Pentium III processor. The
authors use an integral image representation where every pixel is the authors use an integral image representation where every pixel is the
@ -909,7 +909,7 @@ representation allows them to quickly and efficiently calculate
Haar-like features. Haar-like features.
The Haar-like features are passed to a modified AdaBoost The Haar-like features are passed to a modified AdaBoost
algorithm~\cite{freund1995} which only selects the (presumably) most algorithm \cite{freund1995} which only selects the (presumably) most
important features. At the end there is a cascading stage of important features. At the end there is a cascading stage of
classifiers where regions are only considered further if they are classifiers where regions are only considered further if they are
promising. Every additional classifier adds complexity, but once a promising. Every additional classifier adds complexity, but once a
@ -921,7 +921,7 @@ achieves comparable results to the state of the art in 2001.
\subsubsection{HOG Detector} \subsubsection{HOG Detector}
\label{sssec:obj-hog} \label{sssec:obj-hog}
The \gls{hog}~\cite{dalal2005} is a feature descriptor used in The \gls{hog} \cite{dalal2005} is a feature descriptor used in
computer vision and image processing to detect objects in images. It computer vision and image processing to detect objects in images. It
is a detector which detects shape like other methods such as is a detector which detects shape like other methods such as
\gls{sift} \cite{lowe1999}. The idea is to use the distribution of \gls{sift} \cite{lowe1999}. The idea is to use the distribution of
@ -940,14 +940,14 @@ with images of 64 by 128 pixels and make sure that the image contains
a margin of 16 pixels around the person. Decreasing the border by a margin of 16 pixels around the person. Decreasing the border by
either enlarging the person or reducing the overall image size results either enlarging the person or reducing the overall image size results
in worse performance. Unfortunately, their method is far from being in worse performance. Unfortunately, their method is far from being
able to process images in real time—a 320 by 240 image takes roughly a able to process images in real time—a $320$ by $240$ image takes
second to process. roughly a second to process.
\subsubsection{Deformable Part-Based Model} \subsubsection{Deformable Part-Based Model}
\label{sssec:obj-dpm} \label{sssec:obj-dpm}
\glspl{dpm}~\cite{felzenszwalb2008a} were the winners of the \gls{voc} \glspl{dpm} \cite{felzenszwalb2008a} were the winners of the \gls{voc}
challenge in the years 2007, 2008 and 2009. The method is heavily challenge in the years 2007, 2008, and 2009. The method is heavily
based on the previously discussed \gls{hog} since it also uses based on the previously discussed \gls{hog} since it also uses
\gls{hog} descriptors internally. The authors addition is the idea of \gls{hog} descriptors internally. The authors addition is the idea of
learning how to decompose objects during training and learning how to decompose objects during training and
@ -1008,25 +1008,25 @@ often not as efficient as one-stage detectors.
\textcite{girshick2014} were the first to propose using feature \textcite{girshick2014} were the first to propose using feature
representations of \glspl{cnn} for object detection. Their approach representations of \glspl{cnn} for object detection. Their approach
consists of generating around 2000 region proposals and passing these consists of generating around $2000$ region proposals and passing
on to a \gls{cnn} for feature extraction. The fixed-length feature these on to a \gls{cnn} for feature extraction. The fixed-length
vector is used as input for a linear \gls{svm} which classifies the feature vector is used as input for a linear \gls{svm} which
region. They name their method R-\gls{cnn}, where the R stands for classifies the region. They name their method R-\gls{cnn}, where the R
region. stands for region.
R-\gls{cnn} uses selective search to generate region proposals R-\gls{cnn} uses selective search to generate region proposals
\cite{uijlings2013}.The authors use selective search's \emph{fast \cite{uijlings2013}.The authors use selective search's \emph{fast
mode} to generate the 2000 proposals and warp (i.e. aspect ratios are mode} to generate the $2000$ proposals and warp (i.e. aspect ratios
not retained) each proposal into the image dimensions required by the are not retained) each proposal into the image dimensions required by
\gls{cnn}. The \gls{cnn}, which matches the architecture of AlexNet the \gls{cnn}. The \gls{cnn}, which matches the architecture of
\cite{krizhevsky2012}, generates a $4096$-dimensional feature vector AlexNet \cite{krizhevsky2012}, generates a $4096$-dimensional feature
and each feature vector is scored by a linear \gls{svm} for each vector and each feature vector is scored by a linear \gls{svm} for
class. Scored regions are selected/discarded by comparing each region each class. Scored regions are selected/discarded by comparing each
to other regions within the same class and rejecting them if there region to other regions within the same class and rejecting them if
exists another region with a higher score and greater \gls{iou} than a there exists another region with a higher score and greater \gls{iou}
threshold. The linear \gls{svm} classifiers are trained to only label than a threshold. The linear \gls{svm} classifiers are trained to only
a region as positive if the overlap, as measured by \gls{iou}, is label a region as positive if the overlap, as measured by \gls{iou},
above $0.3$. is above $0.3$.
While the approach of generating region proposals is not new, using a While the approach of generating region proposals is not new, using a
\gls{cnn} purely for feature extraction is. Unfortunately, R-\gls{cnn} \gls{cnn} purely for feature extraction is. Unfortunately, R-\gls{cnn}
@ -1132,15 +1132,15 @@ on all levels. \glspl{fpn} are an important building block of many
state-of-the-art object detectors. state-of-the-art object detectors.
A \gls{fpn} first computes the feature pyramid bottom-up with a A \gls{fpn} first computes the feature pyramid bottom-up with a
scaling step of 2. The lower levels capture less semantic information scaling step of two. The lower levels capture less semantic information
than the higher levels, but include more spatial information due to than the higher levels, but include more spatial information due to
the higher granularity. In a second step, the \gls{fpn} upsamples the the higher granularity. In a second step, the \gls{fpn} upsamples the
higher levels such that the dimensions of two consecutive layers are higher levels such that the dimensions of two consecutive layers are
the same. The upsampled top layer is merged with the layer beneath it the same. The upsampled top layer is merged with the layer beneath it
via element-wise addition and convolved with a $1\times 1$ convolutional via element-wise addition and convolved with a one by one
layer to reduce channel dimensions and to smooth out potential convolutional layer to reduce channel dimensions and to smooth out
artifacts introduced during the upsampling step. The results of that potential artifacts introduced during the upsampling step. The results
operation constitute the new \emph{top layer} and the process of that operation constitute the new \emph{top layer} and the process
continues with the layer below it until the finest resolution feature continues with the layer below it until the finest resolution feature
map is generated. In this way, the features of the different layers at map is generated. In this way, the features of the different layers at
different scales are fused to obtain a feature map with high semantic different scales are fused to obtain a feature map with high semantic
@ -1216,7 +1216,7 @@ detect smaller and denser objects as well.
The authors report results on \gls{voc} 2007 for their \gls{ssd}300 The authors report results on \gls{voc} 2007 for their \gls{ssd}300
and \gls{ssd}512 model varieties. The number refers to the size of the and \gls{ssd}512 model varieties. The number refers to the size of the
input images. \gls{ssd}300 outperforms Fast R-\gls{cnn} by 1.1 input images. \gls{ssd}300 outperforms Fast R-\gls{cnn} by $1.1$
percentage points (\gls{map} 66.9\% vs 68\%). \gls{ssd}512 outperforms percentage points (\gls{map} 66.9\% vs 68\%). \gls{ssd}512 outperforms
Faster R-\gls{cnn} by 1.7\% \gls{map}. If trained on the \gls{voc} Faster R-\gls{cnn} by 1.7\% \gls{map}. If trained on the \gls{voc}
2007, 2012 and \gls{coco} train sets, \gls{ssd}512 achieves a 2007, 2012 and \gls{coco} train sets, \gls{ssd}512 achieves a
@ -1343,7 +1343,7 @@ The idea of automatic generation of feature maps via \glspl{ann} gave
rise to \glspl{cnn}. Early \glspl{cnn} \cite{lecun1989} were mostly rise to \glspl{cnn}. Early \glspl{cnn} \cite{lecun1989} were mostly
discarded for practical applications because they require much more discarded for practical applications because they require much more
data during training than traditional methods and also more processing data during training than traditional methods and also more processing
power during inference. Passing $224\times 224$ pixel images to a power during inference. Passing $224$ by $224$ pixel images to a
\gls{cnn}, as is common today, was simply not feasible if one wanted a \gls{cnn}, as is common today, was simply not feasible if one wanted a
reasonable inference time. With the development of \glspl{gpu} and reasonable inference time. With the development of \glspl{gpu} and
supporting software such as the \gls{cuda} toolkit, it was possible to supporting software such as the \gls{cuda} toolkit, it was possible to
@ -1367,24 +1367,24 @@ function. The error function with which the weights are updated is
The architecture of LeNet-5 is composed of two convolutional layers, The architecture of LeNet-5 is composed of two convolutional layers,
two pooling layers and a dense block of three fully-connected two pooling layers and a dense block of three fully-connected
layers. The input image is a grayscale image of 32 by 32 pixels. The layers. The input image is a grayscale image of $32$ by $32$
first convolutional layer generates six feature maps, each with a pixels. The first convolutional layer generates six feature maps, each
scale of 28 by 28 pixels. Each feature map is fed to a pooling layer with a scale of $28$ by $28$ pixels. Each feature map is fed to a
which effectively downsamples the image by a factor of two. By pooling layer which effectively downsamples the image by a factor of
aggregating each two by two area in the feature map via averaging, the two. By aggregating each two by two area in the feature map via
authors are more likely to obtain relative (to each other) instead of averaging, the authors are more likely to obtain relative (to each
absolute positions of the features. To make up for the loss in spatial other) instead of absolute positions of the features. To make up for
resolution, the following convolutional layer increases the amount of the loss in spatial resolution, the following convolutional layer
feature maps to 16 which aims to increase the richness of the learned increases the amount of feature maps to $16$ which aims to increase
representations. Another pooling layer follows which reduces the size the richness of the learned representations. Another pooling layer
of each of the 16 feature maps to five by five pixels. A dense block follows which reduces the size of each of the $16$ feature maps to
of three fully-connected layers of 120, 84 and 10 neurons respectively five by five pixels. A dense block of three fully-connected layers of
serves as the actual classifier in the network. The last layer uses 120, 84 and 10 neurons respectively serves as the actual classifier in
the euclidean \gls{rbf} to compute the class an image belongs to (0-9 the network. The last layer uses the euclidean \gls{rbf} to compute
digits). the class an image belongs to (0-9 digits).
The performance of LeNet-5 was measured on the \gls{mnist} database The performance of LeNet-5 was measured on the \gls{mnist} database
which consists of 70.000 labeled images of handwritten digits. The which consists of $70000$ labeled images of handwritten digits. The
\gls{mse} on the test set is 0.95\%. This result is impressive \gls{mse} on the test set is 0.95\%. This result is impressive
considering that character recognition with a \gls{cnn} had not been considering that character recognition with a \gls{cnn} had not been
done before. However, standard machine learning methods of the time, done before. However, standard machine learning methods of the time,
@ -1453,7 +1453,7 @@ second layers of the feature maps present in AlexNet. They identify
multiple problems with their structure such as aliasing artifacts and multiple problems with their structure such as aliasing artifacts and
a mix of low and high frequency information without any mid a mix of low and high frequency information without any mid
frequencies. These results indicate that the filter size in AlexNet is frequencies. These results indicate that the filter size in AlexNet is
too large at 11 by 11 and the authors reduce it to seven by too large at $11$ by $11$ and the authors reduce it to seven by
seven. Additionally, they modify the original stride of four to seven. Additionally, they modify the original stride of four to
two. These two changes result in an improvement in the top-5 error two. These two changes result in an improvement in the top-5 error
rate of 1.6\% over their own replicated AlexNet result of 18.1\%. rate of 1.6\% over their own replicated AlexNet result of 18.1\%.
@ -1461,7 +1461,7 @@ rate of 1.6\% over their own replicated AlexNet result of 18.1\%.
\subsubsection{GoogLeNet} \subsubsection{GoogLeNet}
\label{sssec:theory-googlenet} \label{sssec:theory-googlenet}
GoogLeNet, also known as Inception-v1, was proposed by GoogLeNet, also known as Inception v1, was proposed by
\textcite{szegedy2015} to increase the depth of the network without \textcite{szegedy2015} to increase the depth of the network without
introducing too much additional complexity. Since the relevant parts introducing too much additional complexity. Since the relevant parts
of an image can often be of different sizes, but kernels within of an image can often be of different sizes, but kernels within
@ -1504,15 +1504,15 @@ non-linearities by having two \glspl{relu} instead of only one. The
authors provide five different networks with increasing number of authors provide five different networks with increasing number of
parameters based on these principles. The smallest network has a depth parameters based on these principles. The smallest network has a depth
of eight convolutional layers and three fully-connected layers for the of eight convolutional layers and three fully-connected layers for the
head (11 in total). The largest network has 16 convolutional and three head ($11$ in total). The largest network has $16$ convolutional and
fully-connected layers (19 in total). The fully-connected layers are three fully-connected layers ($19$ in total). The fully-connected
the same for each architecture, only the layout of the convolutional layers are the same for each architecture, only the layout of the
layers varies. convolutional layers varies.
The deepest network with 19 layers achieves a top-5 error rate on The deepest network with $19$ layers achieves a top-5 error rate on
\gls{ilsvrc} 2014 of 9\%. If trained with different image scales in \gls{ilsvrc} 2014 of 9\%. If trained with different image scales in
the range of $S \in [256, 512]$, the same network achieves a top-5 error the range of $S \in [256, 512]$, the same network achieves a top-5 error
rate of 8\% (test set at scale 256). By combining their two largest rate of 8\% (test set at scale $256$). By combining their two largest
architectures and multi-crop as well as dense evaluation, they achieve architectures and multi-crop as well as dense evaluation, they achieve
an ensemble top-5 error rate of 6.8\%, while their best single network an ensemble top-5 error rate of 6.8\%, while their best single network
with multi-crop and dense evaluation results in 7\%, thus beating the with multi-crop and dense evaluation results in 7\%, thus beating the
@ -1522,8 +1522,8 @@ section~\ref{sssec:theory-googlenet}) by 0.9\%.
\subsubsection{ResNet} \subsubsection{ResNet}
\label{sssec:theory-resnet} \label{sssec:theory-resnet}
The 22-layer structure of GoogLeNet \cite{szegedy2015} and the The $22$-layer structure of GoogLeNet \cite{szegedy2015} and the
19-layer structure of VGGNet \cite{simonyan2015} showed that $19$-layer structure of VGGNet \cite{simonyan2015} showed that
\emph{going deeper} is beneficial for achieving better classification \emph{going deeper} is beneficial for achieving better classification
performance. However, the authors of VGGNet already note that stacking performance. However, the authors of VGGNet already note that stacking
even more layers does not lead to better performance because the model even more layers does not lead to better performance because the model
@ -1706,13 +1706,13 @@ Estimated 3 pages for this section.
The literature on machine learning in agriculture is broadly divided The literature on machine learning in agriculture is broadly divided
into four main areas:~livestock management, soil management, water into four main areas:~livestock management, soil management, water
management, and crop management~\cite{benos2021}. Of those four, water management, and crop management \cite{benos2021}. Of those four, water
management only makes up about 10\% of all surveyed papers during the management only makes up about 10\% of all surveyed papers during the
years 2018--2020. This highlights the potential for research in this years 2018--2020. This highlights the potential for research in this
area to have a high real-world impact. area to have a high real-world impact.
\textcite{su2020} used traditional feature extraction and \textcite{su2020} used traditional feature extraction and
pre-processing techniques to train various machine learning models for preprocessing techniques to train various machine learning models for
classifying water stress for a wheat field. They took top-down images classifying water stress for a wheat field. They took top-down images
of the field using an \gls{uav}, segmented wheat pixels from of the field using an \gls{uav}, segmented wheat pixels from
background pixels and constructed features based on spectral background pixels and constructed features based on spectral
@ -1742,47 +1742,49 @@ their results do not transfer well to the other seasons under survey
\textcite{zhuang2017} showed that water stress in maize can be \textcite{zhuang2017} showed that water stress in maize can be
detected early on and, therefore, still provide actionable information detected early on and, therefore, still provide actionable information
before the plants succumb to drought. They installed a camera which before the plants succumb to drought. They installed a camera which
took $640\times480$ pixel RGB images every two hours. A simple linear took $640$ by $480$ pixel RGB images every two hours. A simple linear
classifier (SVM) segmented the image into foreground and background classifier (\gls{svm}) segmented the image into foreground and
using the green color channel. The authors constructed a background using the green color channel. The authors constructed a
fourteen-dimensional feature space consisting of color and texture $14$-dimensional feature space consisting of color and texture
features. A gradient boosted decision tree (GBDT) model classified the features. A \gls{gbdt} model classified the images into water stressed
images into water stressed and non-stressed and achieved an accuracy and non-stressed and achieved an accuracy of
of $\qty{90.39}{\percent}$. Remarkably, the classification was not $\qty{90.39}{\percent}$. Remarkably, the classification was not
significantly impacted by illumination changes throughout the day. significantly impacted by illumination changes throughout the day.
\textcite{an2019} used the ResNet50 model as a basis for transfer \textcite{an2019} used the ResNet50 model (see
learning and achieved high classification scores (ca. 95\%) on section~\ref{sssec:theory-resnet}) as a basis for transfer learning and
maize. Their model was fed with $640\times480$ pixel images of maize achieved high classification scores (ca. 95\%) on maize. Their model
from three different viewpoints and across three different growth was fed with $640$ by $480$ pixel images of maize from three different
phases. The images were converted to grayscale which turned out to viewpoints and across three different growth phases. The images were
slightly lower classification accuracy. Their results also highlight converted to grayscale which turned out to slightly lower
the superiority of deep convolutional neural networks (DCNNs) compared classification accuracy. Their results also highlight the superiority
to manual feature extraction and gradient boosted decision trees of \glspl{dcnn} compared to manual feature extraction and
(GBDTs). \glspl{gbdt}.
\textcite{chandel2021} investigated deep learning models in depth by \textcite{chandel2021} investigated deep learning models in depth by
comparing three well-known CNNs. The models under scrutiny were comparing three well-known \glspl{cnn}. The models under scrutiny were
AlexNet, GoogLeNet, and Inception V3. Each model was trained with a AlexNet (see section~\ref{sssec:theory-alexnet}), GoogLeNet (see
dataset containing images of maize, okra, and soybean at different section~\ref{sssec:theory-googlenet}), and Inception v3. Each model
stages of growth and under stress and no stress. The researchers did was trained with a dataset containing images of maize, okra, and
not include an object detection step before image classification and soybean at different stages of growth and under stress and no
compiled a fairly small dataset of 1200 images. Of the three models, stress. The researchers did not include an object detection step
GoogLeNet beat the other two with a sizable lead at a classification before image classification and compiled a fairly small dataset of
accuracy of >94\% for all three types of crop. The authors attribute $1200$ images. Of the three models, GoogLeNet beat the other two with
its success to its inherently deeper structure and application of a sizable lead at a classification accuracy of >94\% for all three
multiple convolutional layers at different stages. Unfortunately, all types of crop. The authors attribute its success to its inherently
of the images were taken at the same $\ang{45}\pm\ang{5}$ angle and it deeper structure and application of multiple convolutional layers at
stands to reason that the models would perform significantly worse on different stages. Unfortunately, all of the images were taken at the
images taken under different conditions. same $\ang{45}\pm\ang{5}$ angle and it stands to reason that the models
would perform significantly worse on images taken under different
conditions.
\textcite{ramos-giraldo2020} detected water stress in soybean and corn \textcite{ramos-giraldo2020} detected water stress in soybean and corn
crops with a pretrained model based on DenseNet-121. Low-cost cameras crops with a pretrained model based on DenseNet-121 (see
deployed in the field provided the training data over a 70-day section~\ref{sssec:theory-densenet}). Low-cost cameras deployed in the
period. They achieved a classification accuracy for the degree of field provided the training data over a $70$-day period. They achieved
wilting of 88\%. a classification accuracy for the degree of wilting of 88\%.
In a later study, the same authors~\cite{ramos-giraldo2020a} deployed In a later study, the same authors \cite{ramos-giraldo2020a} deployed
their machine learning model in the field to test it for production their machine learning model in the field to test it for production
use. They installed multiple Raspberry Pis with attached Raspberry Pi use. They installed multiple Raspberry Pis with attached Raspberry Pi
Cameras which took images in $\qty{30}{\minute}$ intervals. The Cameras which took images in $\qty{30}{\minute}$ intervals. The
@ -1797,27 +1799,26 @@ classification scores on corn and soybean with a low-cost setup.
\textcite{azimi2020} demonstrate the efficacy of deep learning models \textcite{azimi2020} demonstrate the efficacy of deep learning models
versus classical machine learning models on chickpea plants. The versus classical machine learning models on chickpea plants. The
authors created their own dataset in a laboratory setting for stressed authors created their own dataset in a laboratory setting for stressed
and non-stressed plants. They acquired 8000 images at eight different and non-stressed plants. They acquired $8000$ images at eight
angles in total. For the classical machine learning models, they different angles in total. For the classical machine learning models,
extracted feature vectors using scale-invariant feature transform they extracted feature vectors using \gls{sift} and \gls{hog}. The
(SIFT) and histogram of oriented gradients (HOG). The features are fed features are fed into three classical machine learning models:
into three classical machine learning models: support vector machine \gls{svm}, \gls{k-nn}, and a \gls{dt} using the \gls{cart}
(SVM), k-nearest neighbors (KNN), and a decision tree (DT) using the algorithm. On the deep learning side, they used their own \gls{cnn}
classification and regression (CART) algorithm. On the deep learning architecture and the pretrained ResNet-18 (see
side, they used their own CNN architecture and the pre-trained section~\ref{sssec:theory-resnet}) model. The accuracy scores for the
ResNet-18 model. The accuracy scores for the classical models was in classical models was in the range of $\qty{60}{\percent}$ to
the range of $\qty{60}{\percent}$ to $\qty{73}{\percent}$ with the SVM $\qty{73}{\percent}$ with the \gls{svm} outperforming the two
outperforming the two others. The CNN achieved higher scores at others. The \gls{cnn} achieved higher scores at $\qty{72}{\percent}$
$\qty{72}{\percent}$ to $\qty{78}{\percent}$ and ResNet-18 achieved to $\qty{78}{\percent}$ and ResNet-18 achieved the highest scores at
the highest scores at $\qty{82}{\percent}$ to $\qty{82}{\percent}$ to $\qty{86}{\percent}$. The results clearly show
$\qty{86}{\percent}$. The results clearly show the superiority of deep the superiority of deep learning over classical machine learning. A
learning over classical machine learning. A downside of their approach downside of their approach lies in the collection of the images. The
lies in the collection of the images. The background in all images was background in all images was uniformly white and the plants were
uniformly white and the plants were prominently placed in the prominently placed in the center. It should, therefore, not be assumed
center. It should, therefore, not be assumed that the same that the same classification scores can be achieved on plants in the
classification scores can be achieved on plants in the field with field with messy and noisy backgrounds as well as illumination changes
messy and noisy backgrounds as well as illumination changes and so and so forth.
forth.
A significant problem in the detection of water stress is posed by the A significant problem in the detection of water stress is posed by the
evolution of indicators across time. Since physiological features such evolution of indicators across time. Since physiological features such
@ -2189,27 +2190,28 @@ validation and testing, respectively.
Of the 91479 images around 10\% were used for the test phase. These Of the 91479 images around 10\% were used for the test phase. These
images contain a total of 12238 ground truth images contain a total of 12238 ground truth
labels. Table~\ref{tab:yolo-metrics} shows precision, recall and the labels. Table~\ref{tab:yolo-metrics} shows precision, recall and the
harmonic mean of both (F1-score). The results indicate that the model harmonic mean of both ($\mathrm{F}_1$-score). The results indicate
errs on the side of sensitivity because recall is higher than that the model errs on the side of sensitivity because recall is
precision. Although some detections are not labeled as plants in the higher than precision. Although some detections are not labeled as
dataset, if there is a labeled plant in the ground truth data, the plants in the dataset, if there is a labeled plant in the ground truth
chance is high that it will be detected. This behavior is in line with data, the chance is high that it will be detected. This behavior is in
how the model's detections are handled in practice. The detections are line with how the model's detections are handled in practice. The
drawn on the original image and the user is able to check the bounding detections are drawn on the original image and the user is able to
boxes visually. If there are wrong detections, the user can ignore check the bounding boxes visually. If there are wrong detections, the
them and focus on the relevant ones instead. A higher recall will thus user can ignore them and focus on the relevant ones instead. A higher
serve the user's needs better than a high precision. recall will thus serve the user's needs better than a high precision.
\begin{table}[h] \begin{table}[h]
\centering \centering
\begin{tabular}{lrrrr} \begin{tabular}{lrrrr}
\toprule \toprule
{} & Precision & Recall & F1-score & Support \\ {} & Precision & Recall & $\mathrm{F}_1$-score & Support \\
\midrule \midrule
Plant & 0.547571 & 0.737866 & 0.628633 & 12238.0 \\ Plant & 0.547571 & 0.737866 & 0.628633 & 12238.0 \\
\bottomrule \bottomrule
\end{tabular} \end{tabular}
\caption{Precision, recall and F1-score for the object detection model.} \caption{Precision, recall and $\mathrm{F}_1$-score for the object
detection model.}
\label{tab:yolo-metrics} \label{tab:yolo-metrics}
\end{table} \end{table}
@ -2330,26 +2332,26 @@ increase again after epoch 27.
\centering \centering
\begin{tabular}{lrrrr} \begin{tabular}{lrrrr}
\toprule \toprule
{} & Precision & Recall & F1-score & Support \\ {} & Precision & Recall & $\mathrm{F}_1$-score & Support \\
\midrule \midrule
Plant & 0.633358 & 0.702811 & 0.666279 & 12238.0 \\ Plant & 0.633358 & 0.702811 & 0.666279 & 12238.0 \\
\bottomrule \bottomrule
\end{tabular} \end{tabular}
\caption{Precision, recall and F1-score for the optimized object \caption{Precision, recall and $\mathrm{F}_1$-score for the
detection model.} optimized object detection model.}
\label{tab:yolo-metrics-hyp} \label{tab:yolo-metrics-hyp}
\end{table} \end{table}
Turning to the evaluation of the optimized model on the test dataset, Turning to the evaluation of the optimized model on the test dataset,
table~\ref{tab:yolo-metrics-hyp} shows precision, recall and the table~\ref{tab:yolo-metrics-hyp} shows precision, recall and the
F1-score for the optimized model. Comparing these metrics with the $\mathrm{F}_1$-score for the optimized model. Comparing these metrics
non-optimized version from table~\ref{tab:yolo-metrics}, precision is with the non-optimized version from table~\ref{tab:yolo-metrics},
significantly higher by more than 8.5\%. Recall, however, is 3.5\% precision is significantly higher by more than 8.5\%. Recall, however,
lower. The F1-score is higher by more than 3.7\% which indicates that is 3.5\% lower. The $\mathrm{F}_1$-score is higher by more than 3.7\%
the optimized model is better overall despite the lower recall. We which indicates that the optimized model is better overall despite the
feel that the lower recall value is a suitable trade off for the lower recall. We feel that the lower recall value is a suitable trade
substantially higher precision considering that the non-optimized off for the substantially higher precision considering that the
model's precision is quite low at 0.55. non-optimized model's precision is quite low at 0.55.
The precision-recall curves in figure~\ref{fig:yolo-ap-hyp} for the The precision-recall curves in figure~\ref{fig:yolo-ap-hyp} for the
optimized model show that the model draws looser bounding boxes than optimized model show that the model draws looser bounding boxes than
@ -2438,7 +2440,7 @@ The random search was run for 138 iterations which equates to a 75\%
probability that the best solution lies within 1\% of the theoretical probability that the best solution lies within 1\% of the theoretical
maximum~\eqref{eq:opt-prob}. Figure~\ref{fig:classifier-hyp-results} maximum~\eqref{eq:opt-prob}. Figure~\ref{fig:classifier-hyp-results}
shows three of the eight parameters and their impact on a high shows three of the eight parameters and their impact on a high
F1-score. \gls{sgd} has less variation in its results than $\mathrm{F}_1$-score. \gls{sgd} has less variation in its results than
Adam~\cite{kingma2017} and manages to provide eight out of the ten Adam~\cite{kingma2017} and manages to provide eight out of the ten
best results. The number of epochs to train for was chosen based on best results. The number of epochs to train for was chosen based on
the observation that almost all configurations converge well before the observation that almost all configurations converge well before
@ -2456,17 +2458,17 @@ figure~\ref{fig:classifier-training-metrics}.
\includegraphics{graphics/classifier-hyp-metrics.pdf} \includegraphics{graphics/classifier-hyp-metrics.pdf}
\caption[Classifier hyper-parameter optimization results.]{This \caption[Classifier hyper-parameter optimization results.]{This
figure shows three of the eight hyper-parameters and their figure shows three of the eight hyper-parameters and their
performance measured by the F1-score during 138 performance measured by the $\mathrm{F}_1$-score during 138
trials. Differently colored markers show the batch size with trials. Differently colored markers show the batch size with
darker colors representing a larger batch size. The type of marker darker colors representing a larger batch size. The type of marker
(circle or cross) shows which optimizer was used. The x-axis shows (circle or cross) shows which optimizer was used. The x-axis shows
the learning rate on a logarithmic scale. In general, a learning the learning rate on a logarithmic scale. In general, a learning
rate between 0.003 and 0.01 results in more robust and better rate between 0.003 and 0.01 results in more robust and better
F1-scores. Larger batch sizes more often lead to better $\mathrm{F}_1$-scores. Larger batch sizes more often lead to
performance as well. As for the type of optimizer, \gls{sgd} better performance as well. As for the type of optimizer,
produced the best iteration with an F1-score of 0.9783. Adam tends \gls{sgd} produced the best iteration with an $\mathrm{F}_1$-score
to require more customization of its parameters than \gls{sgd} to of 0.9783. Adam tends to require more customization of its
achieve good results.} parameters than \gls{sgd} to achieve good results.}
\label{fig:classifier-hyp-results} \label{fig:classifier-hyp-results}
\end{figure} \end{figure}
@ -2477,14 +2479,15 @@ chance due to a coincidentally advantageous train/test split, we
perform stratified $10$-fold cross validation on the dataset. Each perform stratified $10$-fold cross validation on the dataset. Each
fold contains 90\% training and 10\% test data and was trained for 25 fold contains 90\% training and 10\% test data and was trained for 25
epochs. Figure~\ref{fig:classifier-hyp-roc} shows the performance of epochs. Figure~\ref{fig:classifier-hyp-roc} shows the performance of
the epoch with the highest F1-score of each fold as measured against the epoch with the highest $\mathrm{F}_1$-score of each fold as
the test split. The mean \gls{roc} curve provides a robust metric for measured against the test split. The mean \gls{roc} curve provides a
a classifier's performance because it averages out the variability of robust metric for a classifier's performance because it averages out
the evaluation. Each fold manages to achieve at least an \gls{auc} of the variability of the evaluation. Each fold manages to achieve at
0.94, while the best fold reaches 0.98. The mean \gls{roc} has an least an \gls{auc} of 0.94, while the best fold reaches 0.98. The mean
\gls{auc} of 0.96 with a standard deviation of 0.02. These results \gls{roc} has an \gls{auc} of 0.96 with a standard deviation of
indicate that the model is accurately predicting the correct class and 0.02. These results indicate that the model is accurately predicting
is robust against variations in the training set. the correct class and is robust against variations in the training
set.
\begin{table} \begin{table}
\centering \centering
@ -2508,47 +2511,49 @@ is robust against variations in the training set.
\includegraphics{graphics/classifier-hyp-folds-roc.pdf} \includegraphics{graphics/classifier-hyp-folds-roc.pdf}
\caption[Mean \gls{roc} and variability of hyper-parameter-optimized \caption[Mean \gls{roc} and variability of hyper-parameter-optimized
model.]{This plot shows the \gls{roc} curve for the epoch with the model.]{This plot shows the \gls{roc} curve for the epoch with the
highest F1-score of each fold as well as the \gls{auc}. To get a highest $\mathrm{F}_1$-score of each fold as well as the
less variable performance metric of the classifier, the mean \gls{auc}. To get a less variable performance metric of the
\gls{roc} curve is shown as a thick line and the variability is classifier, the mean \gls{roc} curve is shown as a thick line and
shown in gray. The overall mean \gls{auc} is 0.96 with a standard the variability is shown in gray. The overall mean \gls{auc} is
deviation of 0.02. The best-performing fold reaches an \gls{auc} 0.96 with a standard deviation of 0.02. The best-performing fold
of 0.99 and the worst an \gls{auc} of 0.94. The black dashed line reaches an \gls{auc} of 0.99 and the worst an \gls{auc} of
indicates the performance of a classifier which picks classes at 0.94. The black dashed line indicates the performance of a
random ($\mathrm{\gls{auc}} = 0.5$). The shapes of the \gls{roc} classifier which picks classes at random
curves show that the classifier performs well and is robust ($\mathrm{\gls{auc}} = 0.5$). The shapes of the \gls{roc} curves
against variations in the training set.} show that the classifier performs well and is robust against
variations in the training set.}
\label{fig:classifier-hyp-roc} \label{fig:classifier-hyp-roc}
\end{figure} \end{figure}
The classifier shows good performance so far, but care has to be taken The classifier shows good performance so far, but care has to be taken
to not overfit the model to the training set. Comparing the F1-score to not overfit the model to the training set. Comparing the
during training with the F1-score during testing gives insight into $\mathrm{F}_1$-score during training with the $\mathrm{F}_1$-score
when the model tries to increase its performance during training at during testing gives insight into when the model tries to increase its
the expense of generalizability. Figure~\ref{fig:classifier-hyp-folds} performance during training at the expense of
shows the F1-scores of each epoch and fold. The classifier converges generalizability. Figure~\ref{fig:classifier-hyp-folds} shows the
$\mathrm{F}_1$-scores of each epoch and fold. The classifier converges
quickly to 1 for the training set at which point it experiences a quickly to 1 for the training set at which point it experiences a
slight drop in generalizability. Training the model for at most five slight drop in generalizability. Training the model for at most five
epochs is sufficient because there are generally no improvements epochs is sufficient because there are generally no improvements
afterwards. The best-performing epoch for each fold is between the afterwards. The best-performing epoch for each fold is between the
second and fourth epoch which is just before the model achieves an second and fourth epoch which is just before the model achieves an
F1-score of 1 on the training set. $\mathrm{F}_1$-score of 1 on the training set.
\begin{figure} \begin{figure}
\centering \centering
\includegraphics[width=.9\textwidth]{graphics/classifier-hyp-folds-f1.pdf} \includegraphics[width=.9\textwidth]{graphics/classifier-hyp-folds-f1.pdf}
\caption[F1-score of stratified $10$-fold cross validation.]{These \caption[$\mathrm{F}_1$-score of stratified $10$-fold cross
plots show the F1-score during training as well as testing for validation.]{These plots show the $\mathrm{F}_1$-score during
each of the folds. The classifier converges to 1 by the third training as well as testing for each of the folds. The classifier
epoch during the training phase, which might indicate converges to 1 by the third epoch during the training phase, which
overfitting. However, the performance during testing increases might indicate overfitting. However, the performance during
until epoch three in most cases and then stabilizes at testing increases until epoch three in most cases and then
approximately 2-3\% lower than the best epoch. We believe that the stabilizes at approximately 2-3\% lower than the best epoch. We
third, or in some cases fourth, epoch is detrimental to believe that the third, or in some cases fourth, epoch is
performance and results in overfitting, because the model achieves detrimental to performance and results in overfitting, because the
an F1-score of 1 for the training set, but that gain does not model achieves an $\mathrm{F}_1$-score of 1 for the training set,
transfer to the test set. Early stopping during training but that gain does not transfer to the test set. Early stopping
alleviates this problem.} during training alleviates this problem.}
\label{fig:classifier-hyp-folds} \label{fig:classifier-hyp-folds}
\end{figure} \end{figure}
@ -2655,7 +2660,7 @@ bounding boxes of healthy plants and 494 of stressed plants.
\centering \centering
\begin{tabular}{lrrrr} \begin{tabular}{lrrrr}
\toprule \toprule
{} & precision & recall & f1-score & support \\ {} & precision & recall & $\mathrm{F}_{1}$-score & support \\
\midrule \midrule
Healthy & 0.665 & 0.554 & 0.604 & 766 \\ Healthy & 0.665 & 0.554 & 0.604 & 766 \\
Stressed & 0.639 & 0.502 & 0.562 & 494 \\ Stressed & 0.639 & 0.502 & 0.562 & 494 \\
@ -2664,15 +2669,17 @@ bounding boxes of healthy plants and 494 of stressed plants.
weighted avg & 0.655 & 0.533 & 0.588 & 1260 \\ weighted avg & 0.655 & 0.533 & 0.588 & 1260 \\
\bottomrule \bottomrule
\end{tabular} \end{tabular}
\caption{Precision, recall and F1-score for the aggregate model.} \caption{Precision, recall and $\mathrm{F}_1$-score for the
aggregate model.}
\label{tab:model-metrics} \label{tab:model-metrics}
\end{table} \end{table}
Table~\ref{tab:model-metrics} shows precision, recall and the F1-score Table~\ref{tab:model-metrics} shows precision, recall and the
for both classes \emph{Healthy} and \emph{Stressed}. Precision is $\mathrm{F}_1$-score for both classes \emph{Healthy} and
higher than recall for both classes and the F1-score is at \emph{Stressed}. Precision is higher than recall for both classes and
0.59. Unfortunately, these values do not take the accuracy of bounding the $\mathrm{F}_1$-score is at 0.59. Unfortunately, these values do
boxes into account and thus have only limited expressive power. not take the accuracy of bounding boxes into account and thus have
only limited expressive power.
Figure~\ref{fig:aggregate-ap} shows the precision and recall curves Figure~\ref{fig:aggregate-ap} shows the precision and recall curves
for both classes at different \gls{iou} thresholds. The left plot for both classes at different \gls{iou} thresholds. The left plot
@ -2716,7 +2723,7 @@ section~\ref{ssec:aggregate-model}.
\centering \centering
\begin{tabular}{lrrrr} \begin{tabular}{lrrrr}
\toprule \toprule
{} & precision & recall & f1-score & support \\ {} & precision & recall & $\mathrm{F}_{1}$-score & support \\
\midrule \midrule
Healthy & 0.711 & 0.555 & 0.623 & 766 \\ Healthy & 0.711 & 0.555 & 0.623 & 766 \\
Stressed & 0.570 & 0.623 & 0.596 & 494 \\ Stressed & 0.570 & 0.623 & 0.596 & 494 \\
@ -2725,22 +2732,23 @@ section~\ref{ssec:aggregate-model}.
weighted avg & 0.656 & 0.582 & 0.612 & 1260 \\ weighted avg & 0.656 & 0.582 & 0.612 & 1260 \\
\bottomrule \bottomrule
\end{tabular} \end{tabular}
\caption{Precision, recall and F1-score for the optimized aggregate \caption{Precision, recall and $\mathrm{F}_1$-score for the
model.} optimized aggregate model.}
\label{tab:model-metrics-hyp} \label{tab:model-metrics-hyp}
\end{table} \end{table}
Table~\ref{tab:model-metrics-hyp} shows precision, recall and F1-score Table~\ref{tab:model-metrics-hyp} shows precision, recall and
for the optimized model on the same test dataset of 640 images. All of $\mathrm{F}_1$-score for the optimized model on the same test dataset
the metrics are better for the optimized model. In particular, of 640 images. All of the metrics are better for the optimized
precision for the healthy class could be improved significantly while model. In particular, precision for the healthy class could be
recall remains at the same level. This results in a better F1-score improved significantly while recall remains at the same level. This
for the healthy class. Precision for the stressed class is lower with results in a better $\mathrm{F}_1$-score for the healthy
the optimized model, but recall is significantly higher (0.502 class. Precision for the stressed class is lower with the optimized
vs. 0.623). The higher recall results in a 3\% gain for the F1-score model, but recall is significantly higher (0.502 vs. 0.623). The
in the stressed class. Overall, precision is the same but recall has higher recall results in a 3\% gain for the $\mathrm{F}_1$-score in
the stressed class. Overall, precision is the same but recall has
improved significantly, which also results in a noticeable improvement improved significantly, which also results in a noticeable improvement
for the average F1-score across both classes. for the average $\mathrm{F}_1$-score across both classes.
\begin{figure} \begin{figure}
\centering \centering