Add Classification section

2021-10-21 15:54:43 +02:00 · 2021-10-21 15:54:43 +02:00 · 65c0907685
commit 65c0907685
parent 3f6e543441
1 changed files with 94 additions and 2 deletions
--- a/sim.tex
+++ b/sim.tex
@ -197,7 +197,99 @@ measure is calculated from two frames and if the result exceeds a threshold,
 there is movement. The similarity measurements can be aggregated to provide a
 robust detection of camera movement.

-\section{Classification 500 words}
+\section{Classification}
+
+The setting for classification is described by taking a feature space and
+clustering the samples within that feature space. The smaller and well-defined
+the clusters are, the better the classification works. At the same time we want
+to have a high covariance between clusters so that different classes are easily
+distinguishable. Classification is another filtering method which reduces the
+input data—sometimes on the order of millions of dimensions—into simple
+predicates, e.g. \emph{yes} or \emph{no} instances. The goal of classification
+is therefore that semantic enrichment comes along with the filtering process.
+
+The two fundamental methods used in classification are \emph{separation} and
+\emph{hedging}. Separation tries to draw a line between different classes in the
+feature space. Hedging, on the other hand, uses perimeters to cluster samples.
+Additionally, the centroid of each cluster is calculated and the covariance
+between two centroids acts as a measure of separation. Both methods can be
+linked to \emph{concept theories} such as the \emph{classical} and
+\emph{prototype} theory. While concept theory classifies different things based
+on their necessary and sufficient conditions, prototype theory uses typical
+examples to come to a conclusion about a particular thing. The first can be
+mapped to the fundamental method of separation in machine learning, whereas the
+latter is mapped to the method of hedging. In the big picture, hedging is
+remarkably similar to negative convolution, as discussed earlier. Separation,
+on the other hand, has parallels with positive convolution.
+
+If we take separation as an example, there are multiple ways how we can split
+classes using a simple line. One could draw a straight line between two classes
+without caring about individual samples, which are then misclassified. This
+often results in so-called \emph{underfitting}, because the classifier would not
+work well on a dataset which it has not seen before. Conversely, if the line
+includes too many individual samples and is a function of high degree, the
+classifier is likely \emph{overfitting}. Both, underfitting and overfitting, are
+common pitfalls to avoid as the best classifier lies somewhere in-between the
+two. To be able to properly train, test and validate a classifier, the test data
+are split into these three different categories.
+
+\emph{Unsupervised classification} or \emph{clustering} employs either a
+bottom-up or top-down approach. Regardless of the chosen method, unsupervised
+classification works with unlabeled data. The goal is to construct a
+\emph{dendrogram} which consists of distance measures between the samples and
+their centroids with different samples. In the bottom-up approach an individual
+sample marks a leaf of the tree-like dendrogram and is connected through a
+negative convolution measurement to neighboring samples. In the top-down
+approach the dendrogram is not built from the leaves, but by starting from the
+centroid of the entire feature space. Distance measurements to samples within
+the field recursively construct the dendrogram until all samples are included.
+
+One method of \emph{supervised classification} is the \emph{vector space model}.
+It is well-suited for finding items which are similar to a given item (= the
+query or hedge). Usually, a simple distance measurement such as the euclidian
+distance provides results which are good enough, especially for online shops
+where there are millions of products on offer and a more sophisticated approach
+is too costly.
+
+Another method is \emph{k-nearest-neighbors}, which requires ground truth data.
+Here, a new sample is classified by calculating the distance to all neighbors in
+a given diameter. The new datum is added to the cluster which contains the
+closest samples.
+
+\emph{K-means} requires information about the centroids of the individual
+clusters. Distance measurements to the centroids determine to which cluster the
+new sample belongs to.
+
+\emph{Self-organizing maps} are similar to k-means, but with two changes. First,
+all data outside of the area of interest is ignored. Second, after a winning
+cluster is found, it is moved closer to the query object. The process is
+repeated for all other clusters. This second variation on k-means constitutes
+the first application of the concept of \emph{learning}.
+
+\emph{Decision trees} divide the feature space into arbitrarily-sized regions.
+Multiple regions define a particular class. This method is in practice highly
+prone to overfitting, which is why they are combined to form a random forest
+classifier.
+
+\emph{Random forest classifiers} construct many decision trees and pick the
+best-performing ones. Such classifiers are also called \emph{ensemble methods}.
+
+\emph{Deep networks} started off as simple \emph{perceptrons} which were
+ineffective at solving the XOR-Problem. The conclusion was that there had to be
+additional hidden layers and back propagation to adjust the weights of the
+layers. It turned out that hidden layers are ineffective too, because back
+propagation would disproportionately affect later layers (\emph{vanishing
+gradients}). With \emph{convolutional neural networks} (CNNs) all that changed,
+because they combine automatic feature engineering with simple classification,
+processing on the GPU and effective training.
+
+The \emph{radial basis function} is a simpler classifier which consists of one
+input layer, one hidden layer and one output layer. In the first layer we
+compare the input values to codebook vectors and employ a generalization of
+negative convolution. In the second layer the outputs from the first layer are
+multiplied by the weights from the hidden layer. This results in the
+aforementioned dual process model where negative convolution and positive
+convolution are employed to form the output.

 \section{Evaluation 200 words}