Add Spectral Features section

This commit is contained in:
Tobias Eidelpes 2021-10-23 19:40:36 +02:00
parent 78e4465ec4
commit 7c55d2f024

74
sim.tex
View File

@ -407,7 +407,79 @@ lowest at around 4kHz, where we hear best. High energies are needed to hear very
low frequencies. The threshold on the higher end marks the point at which sounds low frequencies. The threshold on the higher end marks the point at which sounds
become painful and it seeks to protect us from damaging our hearing. become painful and it seeks to protect us from damaging our hearing.
\section{Spectral Features 600 words} \section{Spectral Features}
Because analysis of audio in the time domain is very hard to do, especially for
identifying overtones, we employ transforms to the original data to extract more
information. One such transformation was used by Pierre-Simon Laplace in 1785 to
transform problems requiring difficult operations into other problems which
are solvable with simpler operations. The results of the easier calculation
would then be transformed back again into the original problem. This first type
of transformation uses a function and applies a kernel to it with positive
convolution to result in a \emph{spectrum}. Applying the kernel to the spectrum
gives a result to the original function again (=back transformation). The kernel
which proved suitable for this operation is given in \eqref{eq:laplace-kernel}.
\begin{equation}
\label{eq:laplace-kernel}
K_{xy} = e^{-xy}
\end{equation}
In 1823, instead of just having $-xy$ in the exponent, Fourier proposed to add
an imaginary part $i$. The function can be rewritten as $cos(xy) - i\cdot
sin(xy)$, which makes it possible to interpret the original function much more
easily using simple angular functions. The fourier transformation is a
similarity measurement of taking a set of coefficients and measuring the
similarity to a set of angular functions which are overlaid on each other. The
imaginary part of the fourier transform can be dealt with by either throwing the
$sin$ part of the function away or by computing the magnitude by taking the
root of the squared real part plus the squared imaginary part.
One property of the fourier transform is that high frequencies get increasingly
less well-sampled the more information is thrown away during the process. Steep
changes in frequency are spread out over more samples and if only a small
fraction of coefficients is used, the transformation results in a basic sine
wave. Another property for image information is that the most important parts of
the image are located at the ends of the spectrum. There's hardly any
information in the mid-range of the spectrum, which is why it can be concluded
that the bulk of the information in images lies in the edges. Since the middle
part of any spectrum is usually smoothed by the extremes at the edges, smoothing
functions are used to more accurately represent the data we are interested in.
They work by doing an element-wise operation on the spectrum with a window
function. Important ones are the triangular function (Bartlett), gaussian
function (Hamming), a sine function (Kaiser) and a simple rectangular function.
This step is known as \emph{windowing}.
A third transformation is the \emph{discreet cosine transform}. This applies a
kernel of the form $K = cos(xy+\frac{y}{2})$. Due to the uniform nature of the
fourier transform, a lot of image information is quickly lost when coefficients
are thrown away. The cosine transformation, however, manages to retain much more
relevant information even until almost 90\% of coefficients have been thrown
away. In contrast to the fourier transform, which is uniform, it discriminates.
Other wavelets of interest are the \emph{mexican hat} and the \emph{Gabor
wavelet}. In the area of optics, \emph{zernike polynomials} are used to compare
a measurement of a lens to \emph{ideal} optics modeled by a zernike function. If
a pre-defined threshold for the error is exceeded, the optics require a closer
look during the quality assurance process.
While integral transforms allow the original signal to be reconstructed from the
transformed spectrum, \emph{parametric transforms} cannot provide that property.
One such transformation is the \emph{Radon transformation} where an axis is
defined and the luminance values along that axis are added. The axis is then
rotated and the process repeated until all angles have been traversed. The
resulting spectrum is rotation-invariant and the transformation is therefore a
useful pre-processing step for feature engineering. The \emph{Hough
transformation} uses the gradients of an image to construct a histogram. The
information presented in the histogram can be valuable to detect regular
patterns or long edges in an image.
Applications for the fourier transform are identifying spectral peaks, tune
recognition and timbre recognition. All of these use a form of \emph{short-time
fourier transform} (STFT). The FT can also be used for optical flow by shifting
the image and recomputing the spectrum (\emph{phase correlation}). Both the FT
and the CT are successfully used in music recognition and speech recognition
with the \emph{mel-frequency cepstrum coefficients} (MFCC). The CT is used in
MPEG7 for \emph{color histogram encoding} and for texture computation.
\section{Semantic Modeling 200 words} \section{Semantic Modeling 200 words}