Add Classification section

2021-10-21 15:54:43 +02:00 · 2021-10-21 15:54:43 +02:00 · 65c0907685
commit 65c0907685
parent 3f6e543441
1 changed files with 94 additions and 2 deletions
--- a/sim.tex
+++ b/sim.tex
@ -156,7 +156,7 @@ applications, color is often encoded using \emph{YCrCb}, where \emph{Y}
 represents lightness and \emph{Cr} and \emph{Cb} represent $Y-R$ and $Y-B$
 respectively. To find a dominant color within an image, we can choose to only
 look at certain sections of the frame, e.g. the center or the largest continuous
-region of color.  Another approach is to use a color histogram to count the
+region of color. Another approach is to use a color histogram to count the
 number of different hues within the frame.
 Recognizing objects by their texture can be divided into three different
@ -197,7 +197,99 @@ measure is calculated from two frames and if the result exceeds a threshold,
 there is movement. The similarity measurements can be aggregated to provide a
 robust detection of camera movement.
-\section{Classification 500 words}
+\section{Classification}
 The setting for classification is described by taking a feature space and
 clustering the samples within that feature space. The smaller and well-defined
 the clusters are, the better the classification works. At the same time we want
 to have a high covariance between clusters so that different classes are easily
 distinguishable. Classification is another filtering method which reduces the
 input data—sometimes on the order of millions of dimensions—into simple
 predicates, e.g. \emph{yes} or \emph{no} instances. The goal of classification
 is therefore that semantic enrichment comes along with the filtering process.
 The two fundamental methods used in classification are \emph{separation} and
 \emph{hedging}. Separation tries to draw a line between different classes in the
 feature space. Hedging, on the other hand, uses perimeters to cluster samples.
 Additionally, the centroid of each cluster is calculated and the covariance
 between two centroids acts as a measure of separation. Both methods can be
 linked to \emph{concept theories} such as the \emph{classical} and
 \emph{prototype} theory. While concept theory classifies different things based
 on their necessary and sufficient conditions, prototype theory uses typical
 examples to come to a conclusion about a particular thing. The first can be
 mapped to the fundamental method of separation in machine learning, whereas the
 latter is mapped to the method of hedging. In the big picture, hedging is
 remarkably similar to negative convolution, as discussed earlier. Separation,
 on the other hand, has parallels with positive convolution.
 If we take separation as an example, there are multiple ways how we can split
 classes using a simple line. One could draw a straight line between two classes
 without caring about individual samples, which are then misclassified. This
 often results in so-called \emph{underfitting}, because the classifier would not
 work well on a dataset which it has not seen before. Conversely, if the line
 includes too many individual samples and is a function of high degree, the
 classifier is likely \emph{overfitting}. Both, underfitting and overfitting, are
 common pitfalls to avoid as the best classifier lies somewhere in-between the
 two. To be able to properly train, test and validate a classifier, the test data
 are split into these three different categories.
 \emph{Unsupervised classification} or \emph{clustering} employs either a
 bottom-up or top-down approach. Regardless of the chosen method, unsupervised
 classification works with unlabeled data. The goal is to construct a
 \emph{dendrogram} which consists of distance measures between the samples and
 their centroids with different samples. In the bottom-up approach an individual
 sample marks a leaf of the tree-like dendrogram and is connected through a
 negative convolution measurement to neighboring samples. In the top-down
 approach the dendrogram is not built from the leaves, but by starting from the
 centroid of the entire feature space. Distance measurements to samples within
 the field recursively construct the dendrogram until all samples are included.
 One method of \emph{supervised classification} is the \emph{vector space model}.
 It is well-suited for finding items which are similar to a given item (= the
 query or hedge). Usually, a simple distance measurement such as the euclidian
 distance provides results which are good enough, especially for online shops
 where there are millions of products on offer and a more sophisticated approach
 is too costly.
 Another method is \emph{k-nearest-neighbors}, which requires ground truth data.
 Here, a new sample is classified by calculating the distance to all neighbors in
 a given diameter. The new datum is added to the cluster which contains the
 closest samples.
 \emph{K-means} requires information about the centroids of the individual
 clusters. Distance measurements to the centroids determine to which cluster the
 new sample belongs to.
 \emph{Self-organizing maps} are similar to k-means, but with two changes. First,
 all data outside of the area of interest is ignored. Second, after a winning
 cluster is found, it is moved closer to the query object. The process is
 repeated for all other clusters. This second variation on k-means constitutes
 the first application of the concept of \emph{learning}.
 \emph{Decision trees} divide the feature space into arbitrarily-sized regions.
 Multiple regions define a particular class. This method is in practice highly
 prone to overfitting, which is why they are combined to form a random forest
 classifier.
 \emph{Random forest classifiers} construct many decision trees and pick the
 best-performing ones. Such classifiers are also called \emph{ensemble methods}.
 \emph{Deep networks} started off as simple \emph{perceptrons} which were
 ineffective at solving the XOR-Problem. The conclusion was that there had to be
 additional hidden layers and back propagation to adjust the weights of the
 layers. It turned out that hidden layers are ineffective too, because back
 propagation would disproportionately affect later layers (\emph{vanishing
 gradients}). With \emph{convolutional neural networks} (CNNs) all that changed,
 because they combine automatic feature engineering with simple classification,
 processing on the GPU and effective training.
 The \emph{radial basis function} is a simpler classifier which consists of one
 input layer, one hidden layer and one output layer. In the first layer we
 compare the input values to codebook vectors and employ a generalization of
 negative convolution. In the second layer the outputs from the first layer are
 multiplied by the weights from the hidden layer. This results in the
 aforementioned dual process model where negative convolution and positive
 convolution are employed to form the output.
 \section{Evaluation 200 words}