Remove todos

This commit is contained in:
Tobias Eidelpes 2023-11-09 20:22:23 +01:00
parent a8f7b1b6e3
commit 6dc53be531

View File

@ -701,9 +701,8 @@ process because the outputs do not provide valuable information. In
contrast to the Heaviside step function
(section~\ref{sssec:theory-heaviside}), it is differentiable which
allows it to be used with gradient descent optimization
algorithms. \todo[noline]{link to gradient descent and vanishing
gradient sections} Unfortunately, the sigmoid function suffers from
the vanishing gradient problem, which makes it unsuitable for training
algorithms. Unfortunately, the sigmoid function exacerbates the
vanishing gradient problem, which makes it unsuitable for training
deep neural networks.
\subsubsection{Rectified Linear Unit}
@ -727,13 +726,12 @@ feature extractor. The \gls{relu} function is nearly linear, and it
thus preserves many of the properties that make linear models easy to
optimize with gradient-based methods \cite{goodfellow2016}. In
contrast to the sigmoid activation function, the \gls{relu} function
overcomes the vanishing gradient problem \todo{link to vanishing
gradient problem} and is therefore suitable for training deep neural
networks. Furthermore, the \gls{relu} function is easier to calculate
than sigmoid functions which allows networks to be trained more
quickly. Even though it is not differentiable at $0$, it is
differentiable everywhere else and often used with gradient descent
during optimization.
partially mitigates the vanishing gradient problem and is therefore
suitable for training deep neural networks. Furthermore, the
\gls{relu} function is easier to calculate than sigmoid functions
which allows networks to be trained more quickly. Even though it is
not differentiable at $0$, it is differentiable everywhere else and
often used with gradient descent during optimization.
The \gls{relu} function suffers from the dying \gls{relu} problem,
which can cause some neurons to become inactive. Large gradients,