Add Spectral Features section

2021-10-23 19:40:36 +02:00 · 2021-10-23 19:40:36 +02:00 · 7c55d2f024
commit 7c55d2f024
parent 78e4465ec4
1 changed files with 73 additions and 1 deletions
--- a/sim.tex
+++ b/sim.tex
@ -407,7 +407,79 @@ lowest at around 4kHz, where we hear best. High energies are needed to hear very
 low frequencies. The threshold on the higher end marks the point at which sounds
 become painful and it seeks to protect us from damaging our hearing.
-\section{Spectral Features 600 words}
+\section{Spectral Features}
 Because analysis of audio in the time domain is very hard to do, especially for
 identifying overtones, we employ transforms to the original data to extract more
 information. One such transformation was used by Pierre-Simon Laplace in 1785 to
 transform problems requiring difficult operations into other problems which
 are solvable with simpler operations. The results of the easier calculation
 would then be transformed back again into the original problem. This first type
 of transformation uses a function and applies a kernel to it with positive
 convolution to result in a \emph{spectrum}. Applying the kernel to the spectrum
 gives a result to the original function again (=back transformation). The kernel
 which proved suitable for this operation is given in \eqref{eq:laplace-kernel}.
 \begin{equation}
    \label{eq:laplace-kernel}
    K_{xy} = e^{-xy}
 \end{equation}
 In 1823, instead of just having $-xy$ in the exponent, Fourier proposed to add
 an imaginary part $i$. The function can be rewritten as $cos(xy) - i\cdot
 sin(xy)$, which makes it possible to interpret the original function much more
 easily using simple angular functions. The fourier transformation is a
 similarity measurement of taking a set of coefficients and measuring the
 similarity to a set of angular functions which are overlaid on each other. The
 imaginary part of the fourier transform can be dealt with by either throwing the
 $sin$ part of the function away or by computing the magnitude by taking the
 root of the squared real part plus the squared imaginary part.
 One property of the fourier transform is that high frequencies get increasingly
 less well-sampled the more information is thrown away during the process. Steep
 changes in frequency are spread out over more samples and if only a small
 fraction of coefficients is used, the transformation results in a basic sine
 wave. Another property for image information is that the most important parts of
 the image are located at the ends of the spectrum. There's hardly any
 information in the mid-range of the spectrum, which is why it can be concluded
 that the bulk of the information in images lies in the edges. Since the middle
 part of any spectrum is usually smoothed by the extremes at the edges, smoothing
 functions are used to more accurately represent the data we are interested in.
 They work by doing an element-wise operation on the spectrum with a window
 function. Important ones are the triangular function (Bartlett), gaussian
 function (Hamming), a sine function (Kaiser) and a simple rectangular function.
 This step is known as \emph{windowing}.
 A third transformation is the \emph{discreet cosine transform}. This applies a
 kernel of the form $K = cos(xy+\frac{y}{2})$. Due to the uniform nature of the
 fourier transform, a lot of image information is quickly lost when coefficients
 are thrown away. The cosine transformation, however, manages to retain much more
 relevant information even until almost 90\% of coefficients have been thrown
 away. In contrast to the fourier transform, which is uniform, it discriminates.
 Other wavelets of interest are the \emph{mexican hat} and the \emph{Gabor
 wavelet}. In the area of optics, \emph{zernike polynomials} are used to compare
 a measurement of a lens to \emph{ideal} optics modeled by a zernike function. If
 a pre-defined threshold for the error is exceeded, the optics require a closer
 look during the quality assurance process.
 While integral transforms allow the original signal to be reconstructed from the
 transformed spectrum, \emph{parametric transforms} cannot provide that property.
 One such transformation is the \emph{Radon transformation} where an axis is
 defined and the luminance values along that axis are added. The axis is then
 rotated and the process repeated until all angles have been traversed. The
 resulting spectrum is rotation-invariant and the transformation is therefore a
 useful pre-processing step for feature engineering. The \emph{Hough
 transformation} uses the gradients of an image to construct a histogram. The
 information presented in the histogram can be valuable to detect regular
 patterns or long edges in an image.
 Applications for the fourier transform are identifying spectral peaks, tune
 recognition and timbre recognition. All of these use a form of \emph{short-time
 fourier transform} (STFT). The FT can also be used for optical flow by shifting
 the image and recomputing the spectrum (\emph{phase correlation}). Both the FT
 and the CT are successfully used in music recognition and speech recognition
 with the \emph{mel-frequency cepstrum coefficients} (MFCC). The CT is used in
 MPEG7 for \emph{color histogram encoding} and for texture computation.
 \section{Semantic Modeling 200 words}