From 7c55d2f024acbe221d26182655b613438ebb5473 Mon Sep 17 00:00:00 2001 From: Tobias Eidelpes Date: Sat, 23 Oct 2021 19:40:36 +0200 Subject: [PATCH] Add Spectral Features section --- sim.tex | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 73 insertions(+), 1 deletion(-) diff --git a/sim.tex b/sim.tex index 24ca2f8..7599164 100644 --- a/sim.tex +++ b/sim.tex @@ -407,7 +407,79 @@ lowest at around 4kHz, where we hear best. High energies are needed to hear very low frequencies. The threshold on the higher end marks the point at which sounds become painful and it seeks to protect us from damaging our hearing. -\section{Spectral Features 600 words} +\section{Spectral Features} + +Because analysis of audio in the time domain is very hard to do, especially for +identifying overtones, we employ transforms to the original data to extract more +information. One such transformation was used by Pierre-Simon Laplace in 1785 to +transform problems requiring difficult operations into other problems which +are solvable with simpler operations. The results of the easier calculation +would then be transformed back again into the original problem. This first type +of transformation uses a function and applies a kernel to it with positive +convolution to result in a \emph{spectrum}. Applying the kernel to the spectrum +gives a result to the original function again (=back transformation). The kernel +which proved suitable for this operation is given in \eqref{eq:laplace-kernel}. + +\begin{equation} + \label{eq:laplace-kernel} + K_{xy} = e^{-xy} +\end{equation} + +In 1823, instead of just having $-xy$ in the exponent, Fourier proposed to add +an imaginary part $i$. The function can be rewritten as $cos(xy) - i\cdot +sin(xy)$, which makes it possible to interpret the original function much more +easily using simple angular functions. The fourier transformation is a +similarity measurement of taking a set of coefficients and measuring the +similarity to a set of angular functions which are overlaid on each other. The +imaginary part of the fourier transform can be dealt with by either throwing the +$sin$ part of the function away or by computing the magnitude by taking the +root of the squared real part plus the squared imaginary part. + +One property of the fourier transform is that high frequencies get increasingly +less well-sampled the more information is thrown away during the process. Steep +changes in frequency are spread out over more samples and if only a small +fraction of coefficients is used, the transformation results in a basic sine +wave. Another property for image information is that the most important parts of +the image are located at the ends of the spectrum. There's hardly any +information in the mid-range of the spectrum, which is why it can be concluded +that the bulk of the information in images lies in the edges. Since the middle +part of any spectrum is usually smoothed by the extremes at the edges, smoothing +functions are used to more accurately represent the data we are interested in. +They work by doing an element-wise operation on the spectrum with a window +function. Important ones are the triangular function (Bartlett), gaussian +function (Hamming), a sine function (Kaiser) and a simple rectangular function. +This step is known as \emph{windowing}. + +A third transformation is the \emph{discreet cosine transform}. This applies a +kernel of the form $K = cos(xy+\frac{y}{2})$. Due to the uniform nature of the +fourier transform, a lot of image information is quickly lost when coefficients +are thrown away. The cosine transformation, however, manages to retain much more +relevant information even until almost 90\% of coefficients have been thrown +away. In contrast to the fourier transform, which is uniform, it discriminates. +Other wavelets of interest are the \emph{mexican hat} and the \emph{Gabor +wavelet}. In the area of optics, \emph{zernike polynomials} are used to compare +a measurement of a lens to \emph{ideal} optics modeled by a zernike function. If +a pre-defined threshold for the error is exceeded, the optics require a closer +look during the quality assurance process. + +While integral transforms allow the original signal to be reconstructed from the +transformed spectrum, \emph{parametric transforms} cannot provide that property. +One such transformation is the \emph{Radon transformation} where an axis is +defined and the luminance values along that axis are added. The axis is then +rotated and the process repeated until all angles have been traversed. The +resulting spectrum is rotation-invariant and the transformation is therefore a +useful pre-processing step for feature engineering. The \emph{Hough +transformation} uses the gradients of an image to construct a histogram. The +information presented in the histogram can be valuable to detect regular +patterns or long edges in an image. + +Applications for the fourier transform are identifying spectral peaks, tune +recognition and timbre recognition. All of these use a form of \emph{short-time +fourier transform} (STFT). The FT can also be used for optical flow by shifting +the image and recomputing the spectrum (\emph{phase correlation}). Both the FT +and the CT are successfully used in music recognition and speech recognition +with the \emph{mel-frequency cepstrum coefficients} (MFCC). The CT is used in +MPEG7 for \emph{color histogram encoding} and for texture computation. \section{Semantic Modeling 200 words}