From 65c09076858155abd7256735af48f76eaf592a4d Mon Sep 17 00:00:00 2001 From: Tobias Eidelpes Date: Thu, 21 Oct 2021 15:54:43 +0200 Subject: [PATCH] Add Classification section --- sim.tex | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 94 insertions(+), 2 deletions(-) diff --git a/sim.tex b/sim.tex index aa5685d..4d9d045 100644 --- a/sim.tex +++ b/sim.tex @@ -156,7 +156,7 @@ applications, color is often encoded using \emph{YCrCb}, where \emph{Y} represents lightness and \emph{Cr} and \emph{Cb} represent $Y-R$ and $Y-B$ respectively. To find a dominant color within an image, we can choose to only look at certain sections of the frame, e.g. the center or the largest continuous -region of color. Another approach is to use a color histogram to count the +region of color. Another approach is to use a color histogram to count the number of different hues within the frame. Recognizing objects by their texture can be divided into three different @@ -197,7 +197,99 @@ measure is calculated from two frames and if the result exceeds a threshold, there is movement. The similarity measurements can be aggregated to provide a robust detection of camera movement. -\section{Classification 500 words} +\section{Classification} + +The setting for classification is described by taking a feature space and +clustering the samples within that feature space. The smaller and well-defined +the clusters are, the better the classification works. At the same time we want +to have a high covariance between clusters so that different classes are easily +distinguishable. Classification is another filtering method which reduces the +input data—sometimes on the order of millions of dimensions—into simple +predicates, e.g. \emph{yes} or \emph{no} instances. The goal of classification +is therefore that semantic enrichment comes along with the filtering process. + +The two fundamental methods used in classification are \emph{separation} and +\emph{hedging}. Separation tries to draw a line between different classes in the +feature space. Hedging, on the other hand, uses perimeters to cluster samples. +Additionally, the centroid of each cluster is calculated and the covariance +between two centroids acts as a measure of separation. Both methods can be +linked to \emph{concept theories} such as the \emph{classical} and +\emph{prototype} theory. While concept theory classifies different things based +on their necessary and sufficient conditions, prototype theory uses typical +examples to come to a conclusion about a particular thing. The first can be +mapped to the fundamental method of separation in machine learning, whereas the +latter is mapped to the method of hedging. In the big picture, hedging is +remarkably similar to negative convolution, as discussed earlier. Separation, +on the other hand, has parallels with positive convolution. + +If we take separation as an example, there are multiple ways how we can split +classes using a simple line. One could draw a straight line between two classes +without caring about individual samples, which are then misclassified. This +often results in so-called \emph{underfitting}, because the classifier would not +work well on a dataset which it has not seen before. Conversely, if the line +includes too many individual samples and is a function of high degree, the +classifier is likely \emph{overfitting}. Both, underfitting and overfitting, are +common pitfalls to avoid as the best classifier lies somewhere in-between the +two. To be able to properly train, test and validate a classifier, the test data +are split into these three different categories. + +\emph{Unsupervised classification} or \emph{clustering} employs either a +bottom-up or top-down approach. Regardless of the chosen method, unsupervised +classification works with unlabeled data. The goal is to construct a +\emph{dendrogram} which consists of distance measures between the samples and +their centroids with different samples. In the bottom-up approach an individual +sample marks a leaf of the tree-like dendrogram and is connected through a +negative convolution measurement to neighboring samples. In the top-down +approach the dendrogram is not built from the leaves, but by starting from the +centroid of the entire feature space. Distance measurements to samples within +the field recursively construct the dendrogram until all samples are included. + +One method of \emph{supervised classification} is the \emph{vector space model}. +It is well-suited for finding items which are similar to a given item (= the +query or hedge). Usually, a simple distance measurement such as the euclidian +distance provides results which are good enough, especially for online shops +where there are millions of products on offer and a more sophisticated approach +is too costly. + +Another method is \emph{k-nearest-neighbors}, which requires ground truth data. +Here, a new sample is classified by calculating the distance to all neighbors in +a given diameter. The new datum is added to the cluster which contains the +closest samples. + +\emph{K-means} requires information about the centroids of the individual +clusters. Distance measurements to the centroids determine to which cluster the +new sample belongs to. + +\emph{Self-organizing maps} are similar to k-means, but with two changes. First, +all data outside of the area of interest is ignored. Second, after a winning +cluster is found, it is moved closer to the query object. The process is +repeated for all other clusters. This second variation on k-means constitutes +the first application of the concept of \emph{learning}. + +\emph{Decision trees} divide the feature space into arbitrarily-sized regions. +Multiple regions define a particular class. This method is in practice highly +prone to overfitting, which is why they are combined to form a random forest +classifier. + +\emph{Random forest classifiers} construct many decision trees and pick the +best-performing ones. Such classifiers are also called \emph{ensemble methods}. + +\emph{Deep networks} started off as simple \emph{perceptrons} which were +ineffective at solving the XOR-Problem. The conclusion was that there had to be +additional hidden layers and back propagation to adjust the weights of the +layers. It turned out that hidden layers are ineffective too, because back +propagation would disproportionately affect later layers (\emph{vanishing +gradients}). With \emph{convolutional neural networks} (CNNs) all that changed, +because they combine automatic feature engineering with simple classification, +processing on the GPU and effective training. + +The \emph{radial basis function} is a simpler classifier which consists of one +input layer, one hidden layer and one output layer. In the first layer we +compare the input values to codebook vectors and employ a generalization of +negative convolution. In the second layer the outputs from the first layer are +multiplied by the weights from the hidden layer. This results in the +aforementioned dual process model where negative convolution and positive +convolution are employed to form the output. \section{Evaluation 200 words}