Add ZFNet

2023-11-08 10:50:01 +01:00 · 2023-11-08 10:50:01 +01:00 · 0d05347496
commit 0d05347496
parent f72e29d6ad
1 changed files with 30 additions and 1 deletions
--- a/thesis/thesis.tex
+++ b/thesis/thesis.tex
@ -1106,7 +1106,7 @@ possible on \glspl{gpu}. The whole network operates on an almost real
 time scale by being able to process \qty{5}{images\per\s} and
 maintaining high state-of-the-art \gls{map} values of 73.2\%
 (\gls{voc} 2007). If the detection network is switched from VGGNet
-\cite{liu2015} to ZF-Net \cite{zeiler2013}, Faster R-\gls{cnn} is able
+\cite{liu2015} to ZF-Net \cite{zeiler2014}, Faster R-\gls{cnn} is able
 to achieve \qty{17}{images\per\s}, albeit at a lower \gls{map} of
 59.9\%.

@ -1421,6 +1421,35 @@ researchers to apply \glspl{cnn} to the problem of object detection.
 \subsubsection{ZFNet}
 \label{sssec:theory-zfnet}

+ZFNet's \cite{zeiler2014} contributions to the image classification
+field are twofold. First, the authors develop a way to visualize the
+internals of a \gls{cnn} with the use of \emph{deconvolution}
+techniques. Second, with the added knowledge gained from looking
+\emph{inside} a \gls{cnn}, they improve AlexNet's structure. The
+deconvolution technique is essentially the reverse operation of a
+\gls{cnn} layer. Instead of pooling (downsampling) the results of the
+layer, \textcite{zeiler2014} \emph{unpool} the max-pooled values by
+recording the maximum positions of the maximum value per kernel. The
+maximum values are then put back into each two by two area (depending
+on the kernel size). This process loses information because a
+max-pooling layer is not invertible. The subsequent \gls{relu}
+function can be easily inverted because negative values are squashed
+to zero and and positive values are retained. The final deconvolution
+operation concerns the convolutional layer itself. In order to
+\emph{reconstruct} the original spatial dimensions (before
+convolution), a transposed convolution is performed. This process
+reverses the downsampling which happens during convolution.
+
+With these techniques in place, the authors visualize the first and
+second layers of the feature maps present in AlexNet. They identify
+multiple problems with their structure such as aliasing artifacts and
+a mix of low and high frequency information without any mid
+frequencies. These results indicate that the filter size in AlexNet is
+too large at 11 by 11 and the authors reduce it to seven by
+seven. Additionally, they modify the original stride of four to
+two. These two changes result in an improvement in the top-5 error
+rate of 1.6\% over their own replicated AlexNet result of 18.1\%.
+
 \subsubsection{GoogLeNet}
 \label{sssec:theory-googlenet}