Remove todos

This commit is contained in:
Tobias Eidelpes 2023-11-09 20:22:23 +01:00
parent a8f7b1b6e3
commit 6dc53be531

View File

@ -701,9 +701,8 @@ process because the outputs do not provide valuable information. In
contrast to the Heaviside step function contrast to the Heaviside step function
(section~\ref{sssec:theory-heaviside}), it is differentiable which (section~\ref{sssec:theory-heaviside}), it is differentiable which
allows it to be used with gradient descent optimization allows it to be used with gradient descent optimization
algorithms. \todo[noline]{link to gradient descent and vanishing algorithms. Unfortunately, the sigmoid function exacerbates the
gradient sections} Unfortunately, the sigmoid function suffers from vanishing gradient problem, which makes it unsuitable for training
the vanishing gradient problem, which makes it unsuitable for training
deep neural networks. deep neural networks.
\subsubsection{Rectified Linear Unit} \subsubsection{Rectified Linear Unit}
@ -727,13 +726,12 @@ feature extractor. The \gls{relu} function is nearly linear, and it
thus preserves many of the properties that make linear models easy to thus preserves many of the properties that make linear models easy to
optimize with gradient-based methods \cite{goodfellow2016}. In optimize with gradient-based methods \cite{goodfellow2016}. In
contrast to the sigmoid activation function, the \gls{relu} function contrast to the sigmoid activation function, the \gls{relu} function
overcomes the vanishing gradient problem \todo{link to vanishing partially mitigates the vanishing gradient problem and is therefore
gradient problem} and is therefore suitable for training deep neural suitable for training deep neural networks. Furthermore, the
networks. Furthermore, the \gls{relu} function is easier to calculate \gls{relu} function is easier to calculate than sigmoid functions
than sigmoid functions which allows networks to be trained more which allows networks to be trained more quickly. Even though it is
quickly. Even though it is not differentiable at $0$, it is not differentiable at $0$, it is differentiable everywhere else and
differentiable everywhere else and often used with gradient descent often used with gradient descent during optimization.
during optimization.
The \gls{relu} function suffers from the dying \gls{relu} problem, The \gls{relu} function suffers from the dying \gls{relu} problem,
which can cause some neurons to become inactive. Large gradients, which can cause some neurons to become inactive. Large gradients,