Add Spectral Features section
This commit is contained in:
parent
78e4465ec4
commit
7c55d2f024
74
sim.tex
74
sim.tex
@ -407,7 +407,79 @@ lowest at around 4kHz, where we hear best. High energies are needed to hear very
|
||||
low frequencies. The threshold on the higher end marks the point at which sounds
|
||||
become painful and it seeks to protect us from damaging our hearing.
|
||||
|
||||
\section{Spectral Features 600 words}
|
||||
\section{Spectral Features}
|
||||
|
||||
Because analysis of audio in the time domain is very hard to do, especially for
|
||||
identifying overtones, we employ transforms to the original data to extract more
|
||||
information. One such transformation was used by Pierre-Simon Laplace in 1785 to
|
||||
transform problems requiring difficult operations into other problems which
|
||||
are solvable with simpler operations. The results of the easier calculation
|
||||
would then be transformed back again into the original problem. This first type
|
||||
of transformation uses a function and applies a kernel to it with positive
|
||||
convolution to result in a \emph{spectrum}. Applying the kernel to the spectrum
|
||||
gives a result to the original function again (=back transformation). The kernel
|
||||
which proved suitable for this operation is given in \eqref{eq:laplace-kernel}.
|
||||
|
||||
\begin{equation}
|
||||
\label{eq:laplace-kernel}
|
||||
K_{xy} = e^{-xy}
|
||||
\end{equation}
|
||||
|
||||
In 1823, instead of just having $-xy$ in the exponent, Fourier proposed to add
|
||||
an imaginary part $i$. The function can be rewritten as $cos(xy) - i\cdot
|
||||
sin(xy)$, which makes it possible to interpret the original function much more
|
||||
easily using simple angular functions. The fourier transformation is a
|
||||
similarity measurement of taking a set of coefficients and measuring the
|
||||
similarity to a set of angular functions which are overlaid on each other. The
|
||||
imaginary part of the fourier transform can be dealt with by either throwing the
|
||||
$sin$ part of the function away or by computing the magnitude by taking the
|
||||
root of the squared real part plus the squared imaginary part.
|
||||
|
||||
One property of the fourier transform is that high frequencies get increasingly
|
||||
less well-sampled the more information is thrown away during the process. Steep
|
||||
changes in frequency are spread out over more samples and if only a small
|
||||
fraction of coefficients is used, the transformation results in a basic sine
|
||||
wave. Another property for image information is that the most important parts of
|
||||
the image are located at the ends of the spectrum. There's hardly any
|
||||
information in the mid-range of the spectrum, which is why it can be concluded
|
||||
that the bulk of the information in images lies in the edges. Since the middle
|
||||
part of any spectrum is usually smoothed by the extremes at the edges, smoothing
|
||||
functions are used to more accurately represent the data we are interested in.
|
||||
They work by doing an element-wise operation on the spectrum with a window
|
||||
function. Important ones are the triangular function (Bartlett), gaussian
|
||||
function (Hamming), a sine function (Kaiser) and a simple rectangular function.
|
||||
This step is known as \emph{windowing}.
|
||||
|
||||
A third transformation is the \emph{discreet cosine transform}. This applies a
|
||||
kernel of the form $K = cos(xy+\frac{y}{2})$. Due to the uniform nature of the
|
||||
fourier transform, a lot of image information is quickly lost when coefficients
|
||||
are thrown away. The cosine transformation, however, manages to retain much more
|
||||
relevant information even until almost 90\% of coefficients have been thrown
|
||||
away. In contrast to the fourier transform, which is uniform, it discriminates.
|
||||
Other wavelets of interest are the \emph{mexican hat} and the \emph{Gabor
|
||||
wavelet}. In the area of optics, \emph{zernike polynomials} are used to compare
|
||||
a measurement of a lens to \emph{ideal} optics modeled by a zernike function. If
|
||||
a pre-defined threshold for the error is exceeded, the optics require a closer
|
||||
look during the quality assurance process.
|
||||
|
||||
While integral transforms allow the original signal to be reconstructed from the
|
||||
transformed spectrum, \emph{parametric transforms} cannot provide that property.
|
||||
One such transformation is the \emph{Radon transformation} where an axis is
|
||||
defined and the luminance values along that axis are added. The axis is then
|
||||
rotated and the process repeated until all angles have been traversed. The
|
||||
resulting spectrum is rotation-invariant and the transformation is therefore a
|
||||
useful pre-processing step for feature engineering. The \emph{Hough
|
||||
transformation} uses the gradients of an image to construct a histogram. The
|
||||
information presented in the histogram can be valuable to detect regular
|
||||
patterns or long edges in an image.
|
||||
|
||||
Applications for the fourier transform are identifying spectral peaks, tune
|
||||
recognition and timbre recognition. All of these use a form of \emph{short-time
|
||||
fourier transform} (STFT). The FT can also be used for optical flow by shifting
|
||||
the image and recomputing the spectrum (\emph{phase correlation}). Both the FT
|
||||
and the CT are successfully used in music recognition and speech recognition
|
||||
with the \emph{mel-frequency cepstrum coefficients} (MFCC). The CT is used in
|
||||
MPEG7 for \emph{color histogram encoding} and for texture computation.
|
||||
|
||||
\section{Semantic Modeling 200 words}
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user