Add Spectral Features section
This commit is contained in:
parent
78e4465ec4
commit
7c55d2f024
74
sim.tex
74
sim.tex
@ -407,7 +407,79 @@ lowest at around 4kHz, where we hear best. High energies are needed to hear very
|
|||||||
low frequencies. The threshold on the higher end marks the point at which sounds
|
low frequencies. The threshold on the higher end marks the point at which sounds
|
||||||
become painful and it seeks to protect us from damaging our hearing.
|
become painful and it seeks to protect us from damaging our hearing.
|
||||||
|
|
||||||
\section{Spectral Features 600 words}
|
\section{Spectral Features}
|
||||||
|
|
||||||
|
Because analysis of audio in the time domain is very hard to do, especially for
|
||||||
|
identifying overtones, we employ transforms to the original data to extract more
|
||||||
|
information. One such transformation was used by Pierre-Simon Laplace in 1785 to
|
||||||
|
transform problems requiring difficult operations into other problems which
|
||||||
|
are solvable with simpler operations. The results of the easier calculation
|
||||||
|
would then be transformed back again into the original problem. This first type
|
||||||
|
of transformation uses a function and applies a kernel to it with positive
|
||||||
|
convolution to result in a \emph{spectrum}. Applying the kernel to the spectrum
|
||||||
|
gives a result to the original function again (=back transformation). The kernel
|
||||||
|
which proved suitable for this operation is given in \eqref{eq:laplace-kernel}.
|
||||||
|
|
||||||
|
\begin{equation}
|
||||||
|
\label{eq:laplace-kernel}
|
||||||
|
K_{xy} = e^{-xy}
|
||||||
|
\end{equation}
|
||||||
|
|
||||||
|
In 1823, instead of just having $-xy$ in the exponent, Fourier proposed to add
|
||||||
|
an imaginary part $i$. The function can be rewritten as $cos(xy) - i\cdot
|
||||||
|
sin(xy)$, which makes it possible to interpret the original function much more
|
||||||
|
easily using simple angular functions. The fourier transformation is a
|
||||||
|
similarity measurement of taking a set of coefficients and measuring the
|
||||||
|
similarity to a set of angular functions which are overlaid on each other. The
|
||||||
|
imaginary part of the fourier transform can be dealt with by either throwing the
|
||||||
|
$sin$ part of the function away or by computing the magnitude by taking the
|
||||||
|
root of the squared real part plus the squared imaginary part.
|
||||||
|
|
||||||
|
One property of the fourier transform is that high frequencies get increasingly
|
||||||
|
less well-sampled the more information is thrown away during the process. Steep
|
||||||
|
changes in frequency are spread out over more samples and if only a small
|
||||||
|
fraction of coefficients is used, the transformation results in a basic sine
|
||||||
|
wave. Another property for image information is that the most important parts of
|
||||||
|
the image are located at the ends of the spectrum. There's hardly any
|
||||||
|
information in the mid-range of the spectrum, which is why it can be concluded
|
||||||
|
that the bulk of the information in images lies in the edges. Since the middle
|
||||||
|
part of any spectrum is usually smoothed by the extremes at the edges, smoothing
|
||||||
|
functions are used to more accurately represent the data we are interested in.
|
||||||
|
They work by doing an element-wise operation on the spectrum with a window
|
||||||
|
function. Important ones are the triangular function (Bartlett), gaussian
|
||||||
|
function (Hamming), a sine function (Kaiser) and a simple rectangular function.
|
||||||
|
This step is known as \emph{windowing}.
|
||||||
|
|
||||||
|
A third transformation is the \emph{discreet cosine transform}. This applies a
|
||||||
|
kernel of the form $K = cos(xy+\frac{y}{2})$. Due to the uniform nature of the
|
||||||
|
fourier transform, a lot of image information is quickly lost when coefficients
|
||||||
|
are thrown away. The cosine transformation, however, manages to retain much more
|
||||||
|
relevant information even until almost 90\% of coefficients have been thrown
|
||||||
|
away. In contrast to the fourier transform, which is uniform, it discriminates.
|
||||||
|
Other wavelets of interest are the \emph{mexican hat} and the \emph{Gabor
|
||||||
|
wavelet}. In the area of optics, \emph{zernike polynomials} are used to compare
|
||||||
|
a measurement of a lens to \emph{ideal} optics modeled by a zernike function. If
|
||||||
|
a pre-defined threshold for the error is exceeded, the optics require a closer
|
||||||
|
look during the quality assurance process.
|
||||||
|
|
||||||
|
While integral transforms allow the original signal to be reconstructed from the
|
||||||
|
transformed spectrum, \emph{parametric transforms} cannot provide that property.
|
||||||
|
One such transformation is the \emph{Radon transformation} where an axis is
|
||||||
|
defined and the luminance values along that axis are added. The axis is then
|
||||||
|
rotated and the process repeated until all angles have been traversed. The
|
||||||
|
resulting spectrum is rotation-invariant and the transformation is therefore a
|
||||||
|
useful pre-processing step for feature engineering. The \emph{Hough
|
||||||
|
transformation} uses the gradients of an image to construct a histogram. The
|
||||||
|
information presented in the histogram can be valuable to detect regular
|
||||||
|
patterns or long edges in an image.
|
||||||
|
|
||||||
|
Applications for the fourier transform are identifying spectral peaks, tune
|
||||||
|
recognition and timbre recognition. All of these use a form of \emph{short-time
|
||||||
|
fourier transform} (STFT). The FT can also be used for optical flow by shifting
|
||||||
|
the image and recomputing the spectrum (\emph{phase correlation}). Both the FT
|
||||||
|
and the CT are successfully used in music recognition and speech recognition
|
||||||
|
with the \emph{mel-frequency cepstrum coefficients} (MFCC). The CT is used in
|
||||||
|
MPEG7 for \emph{color histogram encoding} and for texture computation.
|
||||||
|
|
||||||
\section{Semantic Modeling 200 words}
|
\section{Semantic Modeling 200 words}
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user