Remove todos
This commit is contained in:
parent
a8f7b1b6e3
commit
6dc53be531
@ -701,9 +701,8 @@ process because the outputs do not provide valuable information. In
|
|||||||
contrast to the Heaviside step function
|
contrast to the Heaviside step function
|
||||||
(section~\ref{sssec:theory-heaviside}), it is differentiable which
|
(section~\ref{sssec:theory-heaviside}), it is differentiable which
|
||||||
allows it to be used with gradient descent optimization
|
allows it to be used with gradient descent optimization
|
||||||
algorithms. \todo[noline]{link to gradient descent and vanishing
|
algorithms. Unfortunately, the sigmoid function exacerbates the
|
||||||
gradient sections} Unfortunately, the sigmoid function suffers from
|
vanishing gradient problem, which makes it unsuitable for training
|
||||||
the vanishing gradient problem, which makes it unsuitable for training
|
|
||||||
deep neural networks.
|
deep neural networks.
|
||||||
|
|
||||||
\subsubsection{Rectified Linear Unit}
|
\subsubsection{Rectified Linear Unit}
|
||||||
@ -727,13 +726,12 @@ feature extractor. The \gls{relu} function is nearly linear, and it
|
|||||||
thus preserves many of the properties that make linear models easy to
|
thus preserves many of the properties that make linear models easy to
|
||||||
optimize with gradient-based methods \cite{goodfellow2016}. In
|
optimize with gradient-based methods \cite{goodfellow2016}. In
|
||||||
contrast to the sigmoid activation function, the \gls{relu} function
|
contrast to the sigmoid activation function, the \gls{relu} function
|
||||||
overcomes the vanishing gradient problem \todo{link to vanishing
|
partially mitigates the vanishing gradient problem and is therefore
|
||||||
gradient problem} and is therefore suitable for training deep neural
|
suitable for training deep neural networks. Furthermore, the
|
||||||
networks. Furthermore, the \gls{relu} function is easier to calculate
|
\gls{relu} function is easier to calculate than sigmoid functions
|
||||||
than sigmoid functions which allows networks to be trained more
|
which allows networks to be trained more quickly. Even though it is
|
||||||
quickly. Even though it is not differentiable at $0$, it is
|
not differentiable at $0$, it is differentiable everywhere else and
|
||||||
differentiable everywhere else and often used with gradient descent
|
often used with gradient descent during optimization.
|
||||||
during optimization.
|
|
||||||
|
|
||||||
The \gls{relu} function suffers from the dying \gls{relu} problem,
|
The \gls{relu} function suffers from the dying \gls{relu} problem,
|
||||||
which can cause some neurons to become inactive. Large gradients,
|
which can cause some neurons to become inactive. Large gradients,
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user