Add hyperparameter section
This commit is contained in:
parent
2267b5ef25
commit
785435e82c
@ -1661,14 +1661,124 @@ train a classifier as well as an object detection model.
|
|||||||
\section{Hyperparameter Optimization}
|
\section{Hyperparameter Optimization}
|
||||||
\label{sec:background-hypopt}
|
\label{sec:background-hypopt}
|
||||||
|
|
||||||
Give a definition of hyperparameter optimization, why it is done and
|
While a network is learning, the parameters of its layers are
|
||||||
which improvements can be expected. Mention the possible approaches
|
updated. These parameters are \emph{learnable} in the sense that
|
||||||
(grid search, random search, bayesian optimization, gradient-based
|
changing them should bring the model closer to solving a
|
||||||
optimization, evolutionary optimization) and discuss the used ones
|
problem. Updating these parameters happens during the
|
||||||
(random search (classifier) and evolutionary optimization (object
|
learning/training phase. Hyperparameters, on the other hand, are not
|
||||||
detector) in detail.
|
included in the learning process because they are fixed before the
|
||||||
|
model starts to train. They are fixed because hyperparameters concern
|
||||||
|
the structure, architecture and learning parameters of the model and
|
||||||
|
without having those in place, a model cannot start training.
|
||||||
|
|
||||||
Estimated 3 pages for this section.
|
Model designers have to carefully define values for a wide range of
|
||||||
|
hyperparameters. Which hyperparameters have to be set is determined by
|
||||||
|
the type of model which is being used. A \gls{svm}, for example, has a
|
||||||
|
penalty parameter $C$ which indicates to the network how lenient it
|
||||||
|
should be when misclassifying training examples. The type of kernel to
|
||||||
|
use is also a hyperparameter for any \gls{svm} and can only be
|
||||||
|
answered by looking at the distribution of the underlying data. In
|
||||||
|
neural networks the range of hyperparameters is even greater because
|
||||||
|
every part of the network architecture such as how many layers to
|
||||||
|
stack, which layers to stack, which kernel sizes to use in each
|
||||||
|
\gls{cnn} layer and which activation function(s) to use in-between the
|
||||||
|
layers is a parameter which can be altered. Finding the best
|
||||||
|
combination of some or all of the available hyperparameters is called
|
||||||
|
\emph{hyperparameter tuning}.
|
||||||
|
|
||||||
|
Hyperparameter tuning can be and is often done manually by researchers
|
||||||
|
where they select values which \emph{have been known to work
|
||||||
|
well}. This approach—while it works to some extent—is not optimal
|
||||||
|
because adhering to \emph{best practice} precludes parameter
|
||||||
|
configurations which would be closer to optimality for a given data
|
||||||
|
set. Furthermore, manual tuning requires a deep understanding of the
|
||||||
|
model itself and how each parameter influences it. Biases present in a
|
||||||
|
researcher's understanding are detrimental to finding optimal
|
||||||
|
hyperparameters and the amount of possible combinations can quickly
|
||||||
|
get intractable. Instead, automated methods to search the
|
||||||
|
hyperparameter space offer an unbiased and more efficient approach to
|
||||||
|
hyperparameter tuning. This type of algorithmic search is called
|
||||||
|
\emph{hyperparameter optimization}.
|
||||||
|
|
||||||
|
\subsection{Grid Search}
|
||||||
|
\label{ssec:grid-search}
|
||||||
|
|
||||||
|
There are multiple possible strategies to opt for when optimizing
|
||||||
|
hyperparameters. The straightforward approach is to do grid search. In
|
||||||
|
grid search, all hyperparameters are discretized and all possible
|
||||||
|
combinations mapped to a search space. The search space is then
|
||||||
|
sampled for configurations at evenly spaced points and the resulting
|
||||||
|
vectors of hyperparameter values are evaluated. For example, if a
|
||||||
|
model has seven hyperparameters and three of those can take on a
|
||||||
|
continuous value, these three variables have to be discretized. In
|
||||||
|
practical terms this means that the model engineer chooses suitable
|
||||||
|
discrete values for said hyperparameters. Once all hyperparameters are
|
||||||
|
discrete, all possible combinations of the hyperparameters are
|
||||||
|
evaluated. If each of the seven hyperparameters has three discrete
|
||||||
|
values, the number of possible combinations is
|
||||||
|
|
||||||
|
\begin{equation}
|
||||||
|
\label{eq:hypopt-nums}
|
||||||
|
3\cdot3\cdot3\cdot3\cdot3\cdot3\cdot3 = 3^{7} = 2187.
|
||||||
|
\end{equation}
|
||||||
|
|
||||||
|
For this example, evaluating $2187$ possible combinations can already
|
||||||
|
be intractable depending on the time required for each run. Further,
|
||||||
|
grid search requires that the resolution of the grid is determined
|
||||||
|
beforehand. If the points on the grid (combinations) are spaced too
|
||||||
|
far apart, the chance of finding a global optimum is lower than if the
|
||||||
|
grid is dense. However, a dense grid results in a higher number of
|
||||||
|
possible combinations and thus more time is required for an exhaustive
|
||||||
|
search. Additionally, grid search suffers from the \emph{curse of
|
||||||
|
dimensionality} because the number of evaluations scales exponentially
|
||||||
|
with the number of hyperparameters.
|
||||||
|
|
||||||
|
\subsection{Random Search}
|
||||||
|
\label{ssec:hypopt-random-search}
|
||||||
|
|
||||||
|
Random search \cite{pinto2009} is an alternative to grid search which
|
||||||
|
often provides configurations which are similar or better in the same
|
||||||
|
amount of time than ones obtained with grid search
|
||||||
|
\cite{bergstra2012}. Random search performs especially well in
|
||||||
|
high-dimensional environments because the hyperparameter response
|
||||||
|
surface is often of \emph{low effective dimensionality}
|
||||||
|
\cite{bergstra2012}. That is, a low number of hyperparameters
|
||||||
|
disproportionately affects the performance of the resulting model and
|
||||||
|
the rest has a negligible effect. We use random search in this work to
|
||||||
|
improve the hyperparameters of our classification model.
|
||||||
|
|
||||||
|
\subsection{Evolution Strategies}
|
||||||
|
\label{ssec:hypopt-evo}
|
||||||
|
|
||||||
|
Evolution strategies follow a population-based model where the search
|
||||||
|
strategy starts from initial random configurations and evolves the
|
||||||
|
hyperparameters through \emph{mutation} and \emph{crossover}. Mutation
|
||||||
|
randomly changes the value of a hyperparameter and crossover creates a
|
||||||
|
new configuration by mixing the values of two
|
||||||
|
configurations. Hyperparameter optimization with evolutionary
|
||||||
|
strategies roughly goes through the following stages
|
||||||
|
\cite{bischl2023}.
|
||||||
|
|
||||||
|
\begin{enumerate}
|
||||||
|
\item Set the hyperparameters to random initial values and create a
|
||||||
|
starting population of configurations.
|
||||||
|
\item Evaluate each configuration.
|
||||||
|
\item Rank all configurations according to a fitness function.
|
||||||
|
\item The best-performing configurations are selected as
|
||||||
|
\emph{parents}.
|
||||||
|
\item Child configurations are created from the parent configurations
|
||||||
|
by mutation and crossover.
|
||||||
|
\item Evaluate the child configurations.
|
||||||
|
\item Go to step three and repeat the process until a termination
|
||||||
|
condition is reached.
|
||||||
|
\end{enumerate}
|
||||||
|
|
||||||
|
This strategy is more efficient than grid search or random search, but
|
||||||
|
requires a substantial amount of iterations for good solutions and can
|
||||||
|
thus be too expensive for hyperparameter optimization
|
||||||
|
\cite{bischl2023}. We use an evolution strategy based on a genetic
|
||||||
|
algorithm in this work to optimize the hyperparameters of our object
|
||||||
|
detection model.
|
||||||
|
|
||||||
\section{Related Work}
|
\section{Related Work}
|
||||||
\label{sec:related-work}
|
\label{sec:related-work}
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user