Add Feature Extraction

2021-10-20 18:06:46 +02:00 · 2021-10-20 18:06:46 +02:00 · c72ed9de6b
commit c72ed9de6b
parent 89621232bd
1 changed files with 72 additions and 0 deletions
--- a/sim.tex
+++ b/sim.tex
@ -125,6 +125,78 @@ used in the computer science domain.

 \section{Feature Engineering 500 words}

+Contrary to popular opinion, the rise of deep learning methods in areas such as
+object recognition has not superseded the classical approach of feature
+engineering in other areas. Particularly in the audio domain and for motion
+detection in videos for example, feature engineering is still the dominant
+method. This is highlighted by the fact that classical methods require much less
+processing which can be beneficial or even crucial for certain applications
+(e.g. edge computing). 
+
+Feature engineering is part of the pipeline which transforms input data into
+classes and labels for that data. After modeling comes feature extraction so
+that these features can be mapped in the feature space. After the classification
+step, we end up with labels corresponding to the input data and the features we
+want. In practice, feature engineering deals with analyzing input signals.
+Common features one might be interested in during analysis is the loudness
+(amplitude), rhythm or motion of a signal.
+
+There are four main features of interest when analyzing visual data: color,
+texture, shape and foreground versus background. Starting with color, the first
+thing that springs to mind is to use the RGB color space to detect specific
+colors. Depending on the application, this might not be the best choice due to
+the three colors being represented by their \emph{pure} versions and different
+hues of a color requiring a change of all three parameters (red, green and
+blue). Other color spaces such as hue, saturation and value (HSV) are better
+suited for color recognition, since we are usually only interested in the hue of
+a color and can therefore better generalize the detection space. Another option
+is posed by the \emph{CIE XYZ} color space which is applicable to situations
+where adherence to how the human vision works is beneficial. For broadcasting
+applications, color is often encoded using \emph{YCrCb}, where \emph{Y}
+represents lightness and \emph{Cr} and \emph{Cb} represent $Y-R$ and $Y-B$
+respectively. To find a dominant color within an image, we can choose to only
+look at certain sections of the frame, e.g. the center or the largest continuous
+region of color.  Another approach is to use a color histogram to count the
+number of different hues within the frame.
+
+Recognizing objects by their texture can be divided into three different
+methods. One approach is to look at the direction pixels are oriented towards to
+get a measure of \emph{directionality}. Secondly, \emph{rhythm} allows us to
+detect if a patch of information (micro block) is repeated in its neighborhood
+through \emph{autocorrelation}. Autocorrelation takes one neighborhood and
+compares it—usually using a generalized distance measure—to all other
+neighborhoods. If the similarity exceeds a certain threshold, there is a high
+probability that a rhythm exists. Third, coarseness can be detected by applying
+a similar process, but by looking at different window sizes to determine if
+there is any loss of information. If there is no loss of information in the
+compressed (smaller) window, the image information is coarse.
+
+Shape detection can be realized using \emph{kernels} of different sizes and with
+different values. An edge detection algorithm might use a sobel matrix to
+compare neighborhoods of an image. If the similarity is high, there is a high
+probability of there being an edge in that neighborhood.
+
+Foreground and background detection relies on the assumption that the coarseness
+is on average higher for the background than for the foreground. This only makes
+sense if videos have been properly recorded using depth of field so that the
+background is much more blurred out than the foreground.
+
+For audio feature extraction, three properties are of relevance: loudness,
+fundamental frequency and rhythm. Specific audio sources have a distinct
+loudness to them where for example classical music has a higher standard
+deviation of loudness than metal. The fundamental frequency can be particularly
+helpful in distinguishing speech from music by analyzing the \emph{zero
+crossings rate} (ZCR). Speech has a lower ZCR than music, because there is a
+limit on how fast humans can speak. Audio signals can often times be made up of
+distinct patterns which are described by the attack, sustain, decay and release
+model. This model is effective in rhythm detection.
+
+Motion in videos is easily detected using crosscorrelation between previous or
+subsequent frames. Similarly to crosscorrelation in other domains, a similarity
+measure is calculated from two frames and if the result exceeds a threshold,
+there is movement. The similarity measurements can be aggregated to provide a
+robust detection of camera movement.
+
 \section{Classification 500 words}

 \section{Evaluation 200 words}