diff --git a/sim.tex b/sim.tex
index 7599164..c09750d 100644
--- a/sim.tex
+++ b/sim.tex
@@ -481,7 +481,34 @@ and the CT are successfully used in music recognition and speech recognition
 with the \emph{mel-frequency cepstrum coefficients} (MFCC). The CT is used in
 MPEG7 for \emph{color histogram encoding} and for texture computation.
 
-\section{Semantic Modeling 200 words}
+\section{Semantic Modeling}
+
+The limits of similarity modeling present themselves when audio or visual
+information only partially provides input for detecting higher semantics. Such
+higher semantics can be the detection of emotion in videos. Simple questions
+such as whether a particular person or object is visible or not, whether the
+person interacts with someone and if that happens multiple times throughout the
+video, are not as semantically complex as recognizing emotion. The latter
+usually requires knowledge of the context in which the event is taking place.
+
+\emph{Factor analysis} or \emph{latent semantic indexing} exploits the fact that
+some information is encoded in multiple clusters within a feature space. By
+extracting these factors, which are similar across multiple groups, it is
+possible to explain many features of the feature vectors.
+
+The building blocks of feature engineering are localization, correlation,
+quantization and aggregation. Localization is the process of getting from an
+input signal to shorter signals which can be analyzed. The output is then
+compared to pre-existing knowledge in the correlation step. Sometimes the
+information is the quantized in different ways and then aggregated to pick the
+most important factors. 
+
+To evaluate which features are good, multiple metrics exist. One such metric has
+already been discussed when clusters of samples are tightly coupled and strongly
+separated from other clusters, called stability. Another metric is the fast
+computation of the features. Features should be easily interpretable, which is
+not always obvious when some transformation has been applied. If the features
+generalize well to different situations regardless of context, they are robust.
 
 \section{Learning over Time 600 words}