Add Feature Extraction
This commit is contained in:
parent
89621232bd
commit
c72ed9de6b
72
sim.tex
72
sim.tex
@ -125,6 +125,78 @@ used in the computer science domain.
|
||||
|
||||
\section{Feature Engineering 500 words}
|
||||
|
||||
Contrary to popular opinion, the rise of deep learning methods in areas such as
|
||||
object recognition has not superseded the classical approach of feature
|
||||
engineering in other areas. Particularly in the audio domain and for motion
|
||||
detection in videos for example, feature engineering is still the dominant
|
||||
method. This is highlighted by the fact that classical methods require much less
|
||||
processing which can be beneficial or even crucial for certain applications
|
||||
(e.g. edge computing).
|
||||
|
||||
Feature engineering is part of the pipeline which transforms input data into
|
||||
classes and labels for that data. After modeling comes feature extraction so
|
||||
that these features can be mapped in the feature space. After the classification
|
||||
step, we end up with labels corresponding to the input data and the features we
|
||||
want. In practice, feature engineering deals with analyzing input signals.
|
||||
Common features one might be interested in during analysis is the loudness
|
||||
(amplitude), rhythm or motion of a signal.
|
||||
|
||||
There are four main features of interest when analyzing visual data: color,
|
||||
texture, shape and foreground versus background. Starting with color, the first
|
||||
thing that springs to mind is to use the RGB color space to detect specific
|
||||
colors. Depending on the application, this might not be the best choice due to
|
||||
the three colors being represented by their \emph{pure} versions and different
|
||||
hues of a color requiring a change of all three parameters (red, green and
|
||||
blue). Other color spaces such as hue, saturation and value (HSV) are better
|
||||
suited for color recognition, since we are usually only interested in the hue of
|
||||
a color and can therefore better generalize the detection space. Another option
|
||||
is posed by the \emph{CIE XYZ} color space which is applicable to situations
|
||||
where adherence to how the human vision works is beneficial. For broadcasting
|
||||
applications, color is often encoded using \emph{YCrCb}, where \emph{Y}
|
||||
represents lightness and \emph{Cr} and \emph{Cb} represent $Y-R$ and $Y-B$
|
||||
respectively. To find a dominant color within an image, we can choose to only
|
||||
look at certain sections of the frame, e.g. the center or the largest continuous
|
||||
region of color. Another approach is to use a color histogram to count the
|
||||
number of different hues within the frame.
|
||||
|
||||
Recognizing objects by their texture can be divided into three different
|
||||
methods. One approach is to look at the direction pixels are oriented towards to
|
||||
get a measure of \emph{directionality}. Secondly, \emph{rhythm} allows us to
|
||||
detect if a patch of information (micro block) is repeated in its neighborhood
|
||||
through \emph{autocorrelation}. Autocorrelation takes one neighborhood and
|
||||
compares it—usually using a generalized distance measure—to all other
|
||||
neighborhoods. If the similarity exceeds a certain threshold, there is a high
|
||||
probability that a rhythm exists. Third, coarseness can be detected by applying
|
||||
a similar process, but by looking at different window sizes to determine if
|
||||
there is any loss of information. If there is no loss of information in the
|
||||
compressed (smaller) window, the image information is coarse.
|
||||
|
||||
Shape detection can be realized using \emph{kernels} of different sizes and with
|
||||
different values. An edge detection algorithm might use a sobel matrix to
|
||||
compare neighborhoods of an image. If the similarity is high, there is a high
|
||||
probability of there being an edge in that neighborhood.
|
||||
|
||||
Foreground and background detection relies on the assumption that the coarseness
|
||||
is on average higher for the background than for the foreground. This only makes
|
||||
sense if videos have been properly recorded using depth of field so that the
|
||||
background is much more blurred out than the foreground.
|
||||
|
||||
For audio feature extraction, three properties are of relevance: loudness,
|
||||
fundamental frequency and rhythm. Specific audio sources have a distinct
|
||||
loudness to them where for example classical music has a higher standard
|
||||
deviation of loudness than metal. The fundamental frequency can be particularly
|
||||
helpful in distinguishing speech from music by analyzing the \emph{zero
|
||||
crossings rate} (ZCR). Speech has a lower ZCR than music, because there is a
|
||||
limit on how fast humans can speak. Audio signals can often times be made up of
|
||||
distinct patterns which are described by the attack, sustain, decay and release
|
||||
model. This model is effective in rhythm detection.
|
||||
|
||||
Motion in videos is easily detected using crosscorrelation between previous or
|
||||
subsequent frames. Similarly to crosscorrelation in other domains, a similarity
|
||||
measure is calculated from two frames and if the result exceeds a threshold,
|
||||
there is movement. The similarity measurements can be aggregated to provide a
|
||||
robust detection of camera movement.
|
||||
|
||||
\section{Classification 500 words}
|
||||
|
||||
\section{Evaluation 200 words}
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user