Move AlexNet to classification section

This commit is contained in:
Tobias Eidelpes 2023-11-08 10:49:35 +01:00
parent 89404df619
commit f72e29d6ad

View File

@ -978,34 +978,11 @@ availability of the 12 million labeled images in the ImageNet dataset
being able to use more data to train models. Earlier models had being able to use more data to train models. Earlier models had
difficulties with making use of the large dataset since training was difficulties with making use of the large dataset since training was
unfeasible. AlexNet, however, provided an architecture which was able unfeasible. AlexNet, however, provided an architecture which was able
to be trained on two \glspl{gpu} within 6 days. to be trained on two \glspl{gpu} within 6 days. For an in depth
overview of AlexNet see section~\ref{sssec:theory-alexnet}. Object
AlexNet's main contributions are the use of \glspl{relu}, training on detection networks from 2014 onward either follow a \emph{one-stage}
multiple \glspl{gpu}, \gls{lrn} and overlapping pooling or \emph{two-stage} detection approach. The following sections go into
\cite{krizhevsky2012}. As mentioned in detail about each model category.
section~\ref{sssec:theory-relu}, \glspl{relu} introduce non-linearity
into the network. Instead of using the traditional non-linear
activation function $\tanh$, where the output is bounded between $-1$
and $1$, \glspl{relu} allow the output layers to grow as high as
training requires it. Normalization before an activation function is
usually used to prevent the neuron from saturating, as would be the
case with $\tanh$. Even though \glspl{relu} do not suffer from
saturation, the authors found that \gls{lrn} reduces the top-1 error
rate by 1.4\% \cite{krizhevsky2012}. Overlapping pooling, in contrast
to regular pooling, does not easily accept the dominant pixel values
per window. By smoothing out the pooled information, bias is reduced
and networks are slightly more resilient to overfitting. Overlapping
pooling reduces the top-1 error rate by 0.4\%
\cite{krizhevsky2012}. In aggregate, these improvements result in a
top-5 error rate of below 25\% at 16.4\%.
These results demonstrated that \glspl{cnn} can extract highly
relevant feature representations from images. While AlexNet was only
concerned with classification of images, it did not take long for
researchers to apply \glspl{cnn} to the problem of object
detection. Object detection networks from 2014 onward either follow a
\emph{one-stage} or \emph{two-stage} detection approach. The following
sections go into detail about each model category.
\subsection{Two-Stage Detectors} \subsection{Two-Stage Detectors}
\label{ssec:theory-two-stage} \label{ssec:theory-two-stage}
@ -1414,6 +1391,33 @@ demonstrated by \textcite{lecun1998}. Only in 2012
section~\ref{ssec:theory-dl-based}) and since then most section~\ref{ssec:theory-dl-based}) and since then most
state-of-the-art image classification methods have used them. state-of-the-art image classification methods have used them.
\subsubsection{AlexNet}
\label{sssec:theory-alexnet}
AlexNet's main contributions are the use of \glspl{relu}, training on
multiple \glspl{gpu}, \gls{lrn} and overlapping pooling
\cite{krizhevsky2012}. As mentioned in
section~\ref{sssec:theory-relu}, \glspl{relu} introduce non-linearity
into the network. Instead of using the traditional non-linear
activation function $\tanh$, where the output is bounded between $-1$
and $1$, \glspl{relu} allow the output layers to grow as high as
training requires it. Normalization before an activation function is
usually used to prevent the neuron from saturating, as would be the
case with $\tanh$. Even though \glspl{relu} do not suffer from
saturation, the authors found that \gls{lrn} reduces the top-1 error
rate by 1.4\% \cite{krizhevsky2012}. Overlapping pooling, in contrast
to regular pooling, does not easily accept the dominant pixel values
per window. By smoothing out the pooled information, bias is reduced
and networks are slightly more resilient to overfitting. Overlapping
pooling reduces the top-1 error rate by 0.4\%
\cite{krizhevsky2012}. In aggregate, these improvements result in a
top-5 error rate of below 25\% at 16.4\%.
These results demonstrated that \glspl{cnn} can extract highly
relevant feature representations from images. While AlexNet was only
concerned with the classification of images, it did not take long for
researchers to apply \glspl{cnn} to the problem of object detection.
\subsubsection{ZFNet} \subsubsection{ZFNet}
\label{sssec:theory-zfnet} \label{sssec:theory-zfnet}