diff --git a/sim.tex b/sim.tex index 7599164..c09750d 100644 --- a/sim.tex +++ b/sim.tex @@ -481,7 +481,34 @@ and the CT are successfully used in music recognition and speech recognition with the \emph{mel-frequency cepstrum coefficients} (MFCC). The CT is used in MPEG7 for \emph{color histogram encoding} and for texture computation. -\section{Semantic Modeling 200 words} +\section{Semantic Modeling} + +The limits of similarity modeling present themselves when audio or visual +information only partially provides input for detecting higher semantics. Such +higher semantics can be the detection of emotion in videos. Simple questions +such as whether a particular person or object is visible or not, whether the +person interacts with someone and if that happens multiple times throughout the +video, are not as semantically complex as recognizing emotion. The latter +usually requires knowledge of the context in which the event is taking place. + +\emph{Factor analysis} or \emph{latent semantic indexing} exploits the fact that +some information is encoded in multiple clusters within a feature space. By +extracting these factors, which are similar across multiple groups, it is +possible to explain many features of the feature vectors. + +The building blocks of feature engineering are localization, correlation, +quantization and aggregation. Localization is the process of getting from an +input signal to shorter signals which can be analyzed. The output is then +compared to pre-existing knowledge in the correlation step. Sometimes the +information is the quantized in different ways and then aggregated to pick the +most important factors. + +To evaluate which features are good, multiple metrics exist. One such metric has +already been discussed when clusters of samples are tightly coupled and strongly +separated from other clusters, called stability. Another metric is the fast +computation of the features. Features should be easily interpretable, which is +not always obvious when some transformation has been applied. If the features +generalize well to different situations regardless of context, they are robust. \section{Learning over Time 600 words}