Add Feature Extraction
This commit is contained in:
parent
89621232bd
commit
c72ed9de6b
72
sim.tex
72
sim.tex
@ -125,6 +125,78 @@ used in the computer science domain.
|
|||||||
|
|
||||||
\section{Feature Engineering 500 words}
|
\section{Feature Engineering 500 words}
|
||||||
|
|
||||||
|
Contrary to popular opinion, the rise of deep learning methods in areas such as
|
||||||
|
object recognition has not superseded the classical approach of feature
|
||||||
|
engineering in other areas. Particularly in the audio domain and for motion
|
||||||
|
detection in videos for example, feature engineering is still the dominant
|
||||||
|
method. This is highlighted by the fact that classical methods require much less
|
||||||
|
processing which can be beneficial or even crucial for certain applications
|
||||||
|
(e.g. edge computing).
|
||||||
|
|
||||||
|
Feature engineering is part of the pipeline which transforms input data into
|
||||||
|
classes and labels for that data. After modeling comes feature extraction so
|
||||||
|
that these features can be mapped in the feature space. After the classification
|
||||||
|
step, we end up with labels corresponding to the input data and the features we
|
||||||
|
want. In practice, feature engineering deals with analyzing input signals.
|
||||||
|
Common features one might be interested in during analysis is the loudness
|
||||||
|
(amplitude), rhythm or motion of a signal.
|
||||||
|
|
||||||
|
There are four main features of interest when analyzing visual data: color,
|
||||||
|
texture, shape and foreground versus background. Starting with color, the first
|
||||||
|
thing that springs to mind is to use the RGB color space to detect specific
|
||||||
|
colors. Depending on the application, this might not be the best choice due to
|
||||||
|
the three colors being represented by their \emph{pure} versions and different
|
||||||
|
hues of a color requiring a change of all three parameters (red, green and
|
||||||
|
blue). Other color spaces such as hue, saturation and value (HSV) are better
|
||||||
|
suited for color recognition, since we are usually only interested in the hue of
|
||||||
|
a color and can therefore better generalize the detection space. Another option
|
||||||
|
is posed by the \emph{CIE XYZ} color space which is applicable to situations
|
||||||
|
where adherence to how the human vision works is beneficial. For broadcasting
|
||||||
|
applications, color is often encoded using \emph{YCrCb}, where \emph{Y}
|
||||||
|
represents lightness and \emph{Cr} and \emph{Cb} represent $Y-R$ and $Y-B$
|
||||||
|
respectively. To find a dominant color within an image, we can choose to only
|
||||||
|
look at certain sections of the frame, e.g. the center or the largest continuous
|
||||||
|
region of color. Another approach is to use a color histogram to count the
|
||||||
|
number of different hues within the frame.
|
||||||
|
|
||||||
|
Recognizing objects by their texture can be divided into three different
|
||||||
|
methods. One approach is to look at the direction pixels are oriented towards to
|
||||||
|
get a measure of \emph{directionality}. Secondly, \emph{rhythm} allows us to
|
||||||
|
detect if a patch of information (micro block) is repeated in its neighborhood
|
||||||
|
through \emph{autocorrelation}. Autocorrelation takes one neighborhood and
|
||||||
|
compares it—usually using a generalized distance measure—to all other
|
||||||
|
neighborhoods. If the similarity exceeds a certain threshold, there is a high
|
||||||
|
probability that a rhythm exists. Third, coarseness can be detected by applying
|
||||||
|
a similar process, but by looking at different window sizes to determine if
|
||||||
|
there is any loss of information. If there is no loss of information in the
|
||||||
|
compressed (smaller) window, the image information is coarse.
|
||||||
|
|
||||||
|
Shape detection can be realized using \emph{kernels} of different sizes and with
|
||||||
|
different values. An edge detection algorithm might use a sobel matrix to
|
||||||
|
compare neighborhoods of an image. If the similarity is high, there is a high
|
||||||
|
probability of there being an edge in that neighborhood.
|
||||||
|
|
||||||
|
Foreground and background detection relies on the assumption that the coarseness
|
||||||
|
is on average higher for the background than for the foreground. This only makes
|
||||||
|
sense if videos have been properly recorded using depth of field so that the
|
||||||
|
background is much more blurred out than the foreground.
|
||||||
|
|
||||||
|
For audio feature extraction, three properties are of relevance: loudness,
|
||||||
|
fundamental frequency and rhythm. Specific audio sources have a distinct
|
||||||
|
loudness to them where for example classical music has a higher standard
|
||||||
|
deviation of loudness than metal. The fundamental frequency can be particularly
|
||||||
|
helpful in distinguishing speech from music by analyzing the \emph{zero
|
||||||
|
crossings rate} (ZCR). Speech has a lower ZCR than music, because there is a
|
||||||
|
limit on how fast humans can speak. Audio signals can often times be made up of
|
||||||
|
distinct patterns which are described by the attack, sustain, decay and release
|
||||||
|
model. This model is effective in rhythm detection.
|
||||||
|
|
||||||
|
Motion in videos is easily detected using crosscorrelation between previous or
|
||||||
|
subsequent frames. Similarly to crosscorrelation in other domains, a similarity
|
||||||
|
measure is calculated from two frames and if the result exceeds a threshold,
|
||||||
|
there is movement. The similarity measurements can be aggregated to provide a
|
||||||
|
robust detection of camera movement.
|
||||||
|
|
||||||
\section{Classification 500 words}
|
\section{Classification 500 words}
|
||||||
|
|
||||||
\section{Evaluation 200 words}
|
\section{Evaluation 200 words}
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user