Move AlexNet to classification section
This commit is contained in:
parent
89404df619
commit
f72e29d6ad
@ -978,34 +978,11 @@ availability of the 12 million labeled images in the ImageNet dataset
|
|||||||
being able to use more data to train models. Earlier models had
|
being able to use more data to train models. Earlier models had
|
||||||
difficulties with making use of the large dataset since training was
|
difficulties with making use of the large dataset since training was
|
||||||
unfeasible. AlexNet, however, provided an architecture which was able
|
unfeasible. AlexNet, however, provided an architecture which was able
|
||||||
to be trained on two \glspl{gpu} within 6 days.
|
to be trained on two \glspl{gpu} within 6 days. For an in depth
|
||||||
|
overview of AlexNet see section~\ref{sssec:theory-alexnet}. Object
|
||||||
AlexNet's main contributions are the use of \glspl{relu}, training on
|
detection networks from 2014 onward either follow a \emph{one-stage}
|
||||||
multiple \glspl{gpu}, \gls{lrn} and overlapping pooling
|
or \emph{two-stage} detection approach. The following sections go into
|
||||||
\cite{krizhevsky2012}. As mentioned in
|
detail about each model category.
|
||||||
section~\ref{sssec:theory-relu}, \glspl{relu} introduce non-linearity
|
|
||||||
into the network. Instead of using the traditional non-linear
|
|
||||||
activation function $\tanh$, where the output is bounded between $-1$
|
|
||||||
and $1$, \glspl{relu} allow the output layers to grow as high as
|
|
||||||
training requires it. Normalization before an activation function is
|
|
||||||
usually used to prevent the neuron from saturating, as would be the
|
|
||||||
case with $\tanh$. Even though \glspl{relu} do not suffer from
|
|
||||||
saturation, the authors found that \gls{lrn} reduces the top-1 error
|
|
||||||
rate by 1.4\% \cite{krizhevsky2012}. Overlapping pooling, in contrast
|
|
||||||
to regular pooling, does not easily accept the dominant pixel values
|
|
||||||
per window. By smoothing out the pooled information, bias is reduced
|
|
||||||
and networks are slightly more resilient to overfitting. Overlapping
|
|
||||||
pooling reduces the top-1 error rate by 0.4\%
|
|
||||||
\cite{krizhevsky2012}. In aggregate, these improvements result in a
|
|
||||||
top-5 error rate of below 25\% at 16.4\%.
|
|
||||||
|
|
||||||
These results demonstrated that \glspl{cnn} can extract highly
|
|
||||||
relevant feature representations from images. While AlexNet was only
|
|
||||||
concerned with classification of images, it did not take long for
|
|
||||||
researchers to apply \glspl{cnn} to the problem of object
|
|
||||||
detection. Object detection networks from 2014 onward either follow a
|
|
||||||
\emph{one-stage} or \emph{two-stage} detection approach. The following
|
|
||||||
sections go into detail about each model category.
|
|
||||||
|
|
||||||
\subsection{Two-Stage Detectors}
|
\subsection{Two-Stage Detectors}
|
||||||
\label{ssec:theory-two-stage}
|
\label{ssec:theory-two-stage}
|
||||||
@ -1414,6 +1391,33 @@ demonstrated by \textcite{lecun1998}. Only in 2012
|
|||||||
section~\ref{ssec:theory-dl-based}) and since then most
|
section~\ref{ssec:theory-dl-based}) and since then most
|
||||||
state-of-the-art image classification methods have used them.
|
state-of-the-art image classification methods have used them.
|
||||||
|
|
||||||
|
\subsubsection{AlexNet}
|
||||||
|
\label{sssec:theory-alexnet}
|
||||||
|
|
||||||
|
AlexNet's main contributions are the use of \glspl{relu}, training on
|
||||||
|
multiple \glspl{gpu}, \gls{lrn} and overlapping pooling
|
||||||
|
\cite{krizhevsky2012}. As mentioned in
|
||||||
|
section~\ref{sssec:theory-relu}, \glspl{relu} introduce non-linearity
|
||||||
|
into the network. Instead of using the traditional non-linear
|
||||||
|
activation function $\tanh$, where the output is bounded between $-1$
|
||||||
|
and $1$, \glspl{relu} allow the output layers to grow as high as
|
||||||
|
training requires it. Normalization before an activation function is
|
||||||
|
usually used to prevent the neuron from saturating, as would be the
|
||||||
|
case with $\tanh$. Even though \glspl{relu} do not suffer from
|
||||||
|
saturation, the authors found that \gls{lrn} reduces the top-1 error
|
||||||
|
rate by 1.4\% \cite{krizhevsky2012}. Overlapping pooling, in contrast
|
||||||
|
to regular pooling, does not easily accept the dominant pixel values
|
||||||
|
per window. By smoothing out the pooled information, bias is reduced
|
||||||
|
and networks are slightly more resilient to overfitting. Overlapping
|
||||||
|
pooling reduces the top-1 error rate by 0.4\%
|
||||||
|
\cite{krizhevsky2012}. In aggregate, these improvements result in a
|
||||||
|
top-5 error rate of below 25\% at 16.4\%.
|
||||||
|
|
||||||
|
These results demonstrated that \glspl{cnn} can extract highly
|
||||||
|
relevant feature representations from images. While AlexNet was only
|
||||||
|
concerned with the classification of images, it did not take long for
|
||||||
|
researchers to apply \glspl{cnn} to the problem of object detection.
|
||||||
|
|
||||||
\subsubsection{ZFNet}
|
\subsubsection{ZFNet}
|
||||||
\label{sssec:theory-zfnet}
|
\label{sssec:theory-zfnet}
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user