Add Feature Extraction

2021-10-20 18:06:46 +02:00 · 2021-10-20 18:06:46 +02:00 · c72ed9de6b
commit c72ed9de6b
parent 89621232bd
1 changed files with 72 additions and 0 deletions
--- a/sim.tex
+++ b/sim.tex
@ -125,6 +125,78 @@ used in the computer science domain.
 \section{Feature Engineering 500 words}
 Contrary to popular opinion, the rise of deep learning methods in areas such as
 object recognition has not superseded the classical approach of feature
 engineering in other areas. Particularly in the audio domain and for motion
 detection in videos for example, feature engineering is still the dominant
 method. This is highlighted by the fact that classical methods require much less
 processing which can be beneficial or even crucial for certain applications
 (e.g. edge computing). 
 Feature engineering is part of the pipeline which transforms input data into
 classes and labels for that data. After modeling comes feature extraction so
 that these features can be mapped in the feature space. After the classification
 step, we end up with labels corresponding to the input data and the features we
 want. In practice, feature engineering deals with analyzing input signals.
 Common features one might be interested in during analysis is the loudness
 (amplitude), rhythm or motion of a signal.
 There are four main features of interest when analyzing visual data: color,
 texture, shape and foreground versus background. Starting with color, the first
 thing that springs to mind is to use the RGB color space to detect specific
 colors. Depending on the application, this might not be the best choice due to
 the three colors being represented by their \emph{pure} versions and different
 hues of a color requiring a change of all three parameters (red, green and
 blue). Other color spaces such as hue, saturation and value (HSV) are better
 suited for color recognition, since we are usually only interested in the hue of
 a color and can therefore better generalize the detection space. Another option
 is posed by the \emph{CIE XYZ} color space which is applicable to situations
 where adherence to how the human vision works is beneficial. For broadcasting
 applications, color is often encoded using \emph{YCrCb}, where \emph{Y}
 represents lightness and \emph{Cr} and \emph{Cb} represent $Y-R$ and $Y-B$
 respectively. To find a dominant color within an image, we can choose to only
 look at certain sections of the frame, e.g. the center or the largest continuous
 region of color.  Another approach is to use a color histogram to count the
 number of different hues within the frame.
 Recognizing objects by their texture can be divided into three different
 methods. One approach is to look at the direction pixels are oriented towards to
 get a measure of \emph{directionality}. Secondly, \emph{rhythm} allows us to
 detect if a patch of information (micro block) is repeated in its neighborhood
 through \emph{autocorrelation}. Autocorrelation takes one neighborhood and
 compares it—usually using a generalized distance measure—to all other
 neighborhoods. If the similarity exceeds a certain threshold, there is a high
 probability that a rhythm exists. Third, coarseness can be detected by applying
 a similar process, but by looking at different window sizes to determine if
 there is any loss of information. If there is no loss of information in the
 compressed (smaller) window, the image information is coarse.
 Shape detection can be realized using \emph{kernels} of different sizes and with
 different values. An edge detection algorithm might use a sobel matrix to
 compare neighborhoods of an image. If the similarity is high, there is a high
 probability of there being an edge in that neighborhood.
 Foreground and background detection relies on the assumption that the coarseness
 is on average higher for the background than for the foreground. This only makes
 sense if videos have been properly recorded using depth of field so that the
 background is much more blurred out than the foreground.
 For audio feature extraction, three properties are of relevance: loudness,
 fundamental frequency and rhythm. Specific audio sources have a distinct
 loudness to them where for example classical music has a higher standard
 deviation of loudness than metal. The fundamental frequency can be particularly
 helpful in distinguishing speech from music by analyzing the \emph{zero
 crossings rate} (ZCR). Speech has a lower ZCR than music, because there is a
 limit on how fast humans can speak. Audio signals can often times be made up of
 distinct patterns which are described by the attack, sustain, decay and release
 model. This model is effective in rhythm detection.
 Motion in videos is easily detected using crosscorrelation between previous or
 subsequent frames. Similarly to crosscorrelation in other domains, a similarity
 measure is calculated from two frames and if the result exceeds a threshold,
 there is movement. The similarity measurements can be aggregated to provide a
 robust detection of camera movement.
 \section{Classification 500 words}
 \section{Evaluation 200 words}