Add hyperparameter section

This commit is contained in:
Tobias Eidelpes 2023-11-22 11:00:50 +01:00
parent 2267b5ef25
commit 785435e82c

View File

@ -1661,14 +1661,124 @@ train a classifier as well as an object detection model.
\section{Hyperparameter Optimization} \section{Hyperparameter Optimization}
\label{sec:background-hypopt} \label{sec:background-hypopt}
Give a definition of hyperparameter optimization, why it is done and While a network is learning, the parameters of its layers are
which improvements can be expected. Mention the possible approaches updated. These parameters are \emph{learnable} in the sense that
(grid search, random search, bayesian optimization, gradient-based changing them should bring the model closer to solving a
optimization, evolutionary optimization) and discuss the used ones problem. Updating these parameters happens during the
(random search (classifier) and evolutionary optimization (object learning/training phase. Hyperparameters, on the other hand, are not
detector) in detail. included in the learning process because they are fixed before the
model starts to train. They are fixed because hyperparameters concern
the structure, architecture and learning parameters of the model and
without having those in place, a model cannot start training.
Estimated 3 pages for this section. Model designers have to carefully define values for a wide range of
hyperparameters. Which hyperparameters have to be set is determined by
the type of model which is being used. A \gls{svm}, for example, has a
penalty parameter $C$ which indicates to the network how lenient it
should be when misclassifying training examples. The type of kernel to
use is also a hyperparameter for any \gls{svm} and can only be
answered by looking at the distribution of the underlying data. In
neural networks the range of hyperparameters is even greater because
every part of the network architecture such as how many layers to
stack, which layers to stack, which kernel sizes to use in each
\gls{cnn} layer and which activation function(s) to use in-between the
layers is a parameter which can be altered. Finding the best
combination of some or all of the available hyperparameters is called
\emph{hyperparameter tuning}.
Hyperparameter tuning can be and is often done manually by researchers
where they select values which \emph{have been known to work
well}. This approach—while it works to some extent—is not optimal
because adhering to \emph{best practice} precludes parameter
configurations which would be closer to optimality for a given data
set. Furthermore, manual tuning requires a deep understanding of the
model itself and how each parameter influences it. Biases present in a
researcher's understanding are detrimental to finding optimal
hyperparameters and the amount of possible combinations can quickly
get intractable. Instead, automated methods to search the
hyperparameter space offer an unbiased and more efficient approach to
hyperparameter tuning. This type of algorithmic search is called
\emph{hyperparameter optimization}.
\subsection{Grid Search}
\label{ssec:grid-search}
There are multiple possible strategies to opt for when optimizing
hyperparameters. The straightforward approach is to do grid search. In
grid search, all hyperparameters are discretized and all possible
combinations mapped to a search space. The search space is then
sampled for configurations at evenly spaced points and the resulting
vectors of hyperparameter values are evaluated. For example, if a
model has seven hyperparameters and three of those can take on a
continuous value, these three variables have to be discretized. In
practical terms this means that the model engineer chooses suitable
discrete values for said hyperparameters. Once all hyperparameters are
discrete, all possible combinations of the hyperparameters are
evaluated. If each of the seven hyperparameters has three discrete
values, the number of possible combinations is
\begin{equation}
\label{eq:hypopt-nums}
3\cdot3\cdot3\cdot3\cdot3\cdot3\cdot3 = 3^{7} = 2187.
\end{equation}
For this example, evaluating $2187$ possible combinations can already
be intractable depending on the time required for each run. Further,
grid search requires that the resolution of the grid is determined
beforehand. If the points on the grid (combinations) are spaced too
far apart, the chance of finding a global optimum is lower than if the
grid is dense. However, a dense grid results in a higher number of
possible combinations and thus more time is required for an exhaustive
search. Additionally, grid search suffers from the \emph{curse of
dimensionality} because the number of evaluations scales exponentially
with the number of hyperparameters.
\subsection{Random Search}
\label{ssec:hypopt-random-search}
Random search \cite{pinto2009} is an alternative to grid search which
often provides configurations which are similar or better in the same
amount of time than ones obtained with grid search
\cite{bergstra2012}. Random search performs especially well in
high-dimensional environments because the hyperparameter response
surface is often of \emph{low effective dimensionality}
\cite{bergstra2012}. That is, a low number of hyperparameters
disproportionately affects the performance of the resulting model and
the rest has a negligible effect. We use random search in this work to
improve the hyperparameters of our classification model.
\subsection{Evolution Strategies}
\label{ssec:hypopt-evo}
Evolution strategies follow a population-based model where the search
strategy starts from initial random configurations and evolves the
hyperparameters through \emph{mutation} and \emph{crossover}. Mutation
randomly changes the value of a hyperparameter and crossover creates a
new configuration by mixing the values of two
configurations. Hyperparameter optimization with evolutionary
strategies roughly goes through the following stages
\cite{bischl2023}.
\begin{enumerate}
\item Set the hyperparameters to random initial values and create a
starting population of configurations.
\item Evaluate each configuration.
\item Rank all configurations according to a fitness function.
\item The best-performing configurations are selected as
\emph{parents}.
\item Child configurations are created from the parent configurations
by mutation and crossover.
\item Evaluate the child configurations.
\item Go to step three and repeat the process until a termination
condition is reached.
\end{enumerate}
This strategy is more efficient than grid search or random search, but
requires a substantial amount of iterations for good solutions and can
thus be too expensive for hyperparameter optimization
\cite{bischl2023}. We use an evolution strategy based on a genetic
algorithm in this work to optimize the hyperparameters of our object
detection model.
\section{Related Work} \section{Related Work}
\label{sec:related-work} \label{sec:related-work}