Add Semantic Modeling section

This commit is contained in:
Tobias Eidelpes 2021-10-23 20:41:09 +02:00
parent 7c55d2f024
commit 266b60704e

29
sim.tex
View File

@ -481,7 +481,34 @@ and the CT are successfully used in music recognition and speech recognition
with the \emph{mel-frequency cepstrum coefficients} (MFCC). The CT is used in
MPEG7 for \emph{color histogram encoding} and for texture computation.
\section{Semantic Modeling 200 words}
\section{Semantic Modeling}
The limits of similarity modeling present themselves when audio or visual
information only partially provides input for detecting higher semantics. Such
higher semantics can be the detection of emotion in videos. Simple questions
such as whether a particular person or object is visible or not, whether the
person interacts with someone and if that happens multiple times throughout the
video, are not as semantically complex as recognizing emotion. The latter
usually requires knowledge of the context in which the event is taking place.
\emph{Factor analysis} or \emph{latent semantic indexing} exploits the fact that
some information is encoded in multiple clusters within a feature space. By
extracting these factors, which are similar across multiple groups, it is
possible to explain many features of the feature vectors.
The building blocks of feature engineering are localization, correlation,
quantization and aggregation. Localization is the process of getting from an
input signal to shorter signals which can be analyzed. The output is then
compared to pre-existing knowledge in the correlation step. Sometimes the
information is the quantized in different ways and then aggregated to pick the
most important factors.
To evaluate which features are good, multiple metrics exist. One such metric has
already been discussed when clusters of samples are tightly coupled and strongly
separated from other clusters, called stability. Another metric is the fast
computation of the features. Features should be easily interpretable, which is
not always obvious when some transformation has been applied. If the features
generalize well to different situations regardless of context, they are robust.
\section{Learning over Time 600 words}