Add Classification section
This commit is contained in:
parent
3f6e543441
commit
65c0907685
94
sim.tex
94
sim.tex
@ -197,7 +197,99 @@ measure is calculated from two frames and if the result exceeds a threshold,
|
||||
there is movement. The similarity measurements can be aggregated to provide a
|
||||
robust detection of camera movement.
|
||||
|
||||
\section{Classification 500 words}
|
||||
\section{Classification}
|
||||
|
||||
The setting for classification is described by taking a feature space and
|
||||
clustering the samples within that feature space. The smaller and well-defined
|
||||
the clusters are, the better the classification works. At the same time we want
|
||||
to have a high covariance between clusters so that different classes are easily
|
||||
distinguishable. Classification is another filtering method which reduces the
|
||||
input data—sometimes on the order of millions of dimensions—into simple
|
||||
predicates, e.g. \emph{yes} or \emph{no} instances. The goal of classification
|
||||
is therefore that semantic enrichment comes along with the filtering process.
|
||||
|
||||
The two fundamental methods used in classification are \emph{separation} and
|
||||
\emph{hedging}. Separation tries to draw a line between different classes in the
|
||||
feature space. Hedging, on the other hand, uses perimeters to cluster samples.
|
||||
Additionally, the centroid of each cluster is calculated and the covariance
|
||||
between two centroids acts as a measure of separation. Both methods can be
|
||||
linked to \emph{concept theories} such as the \emph{classical} and
|
||||
\emph{prototype} theory. While concept theory classifies different things based
|
||||
on their necessary and sufficient conditions, prototype theory uses typical
|
||||
examples to come to a conclusion about a particular thing. The first can be
|
||||
mapped to the fundamental method of separation in machine learning, whereas the
|
||||
latter is mapped to the method of hedging. In the big picture, hedging is
|
||||
remarkably similar to negative convolution, as discussed earlier. Separation,
|
||||
on the other hand, has parallels with positive convolution.
|
||||
|
||||
If we take separation as an example, there are multiple ways how we can split
|
||||
classes using a simple line. One could draw a straight line between two classes
|
||||
without caring about individual samples, which are then misclassified. This
|
||||
often results in so-called \emph{underfitting}, because the classifier would not
|
||||
work well on a dataset which it has not seen before. Conversely, if the line
|
||||
includes too many individual samples and is a function of high degree, the
|
||||
classifier is likely \emph{overfitting}. Both, underfitting and overfitting, are
|
||||
common pitfalls to avoid as the best classifier lies somewhere in-between the
|
||||
two. To be able to properly train, test and validate a classifier, the test data
|
||||
are split into these three different categories.
|
||||
|
||||
\emph{Unsupervised classification} or \emph{clustering} employs either a
|
||||
bottom-up or top-down approach. Regardless of the chosen method, unsupervised
|
||||
classification works with unlabeled data. The goal is to construct a
|
||||
\emph{dendrogram} which consists of distance measures between the samples and
|
||||
their centroids with different samples. In the bottom-up approach an individual
|
||||
sample marks a leaf of the tree-like dendrogram and is connected through a
|
||||
negative convolution measurement to neighboring samples. In the top-down
|
||||
approach the dendrogram is not built from the leaves, but by starting from the
|
||||
centroid of the entire feature space. Distance measurements to samples within
|
||||
the field recursively construct the dendrogram until all samples are included.
|
||||
|
||||
One method of \emph{supervised classification} is the \emph{vector space model}.
|
||||
It is well-suited for finding items which are similar to a given item (= the
|
||||
query or hedge). Usually, a simple distance measurement such as the euclidian
|
||||
distance provides results which are good enough, especially for online shops
|
||||
where there are millions of products on offer and a more sophisticated approach
|
||||
is too costly.
|
||||
|
||||
Another method is \emph{k-nearest-neighbors}, which requires ground truth data.
|
||||
Here, a new sample is classified by calculating the distance to all neighbors in
|
||||
a given diameter. The new datum is added to the cluster which contains the
|
||||
closest samples.
|
||||
|
||||
\emph{K-means} requires information about the centroids of the individual
|
||||
clusters. Distance measurements to the centroids determine to which cluster the
|
||||
new sample belongs to.
|
||||
|
||||
\emph{Self-organizing maps} are similar to k-means, but with two changes. First,
|
||||
all data outside of the area of interest is ignored. Second, after a winning
|
||||
cluster is found, it is moved closer to the query object. The process is
|
||||
repeated for all other clusters. This second variation on k-means constitutes
|
||||
the first application of the concept of \emph{learning}.
|
||||
|
||||
\emph{Decision trees} divide the feature space into arbitrarily-sized regions.
|
||||
Multiple regions define a particular class. This method is in practice highly
|
||||
prone to overfitting, which is why they are combined to form a random forest
|
||||
classifier.
|
||||
|
||||
\emph{Random forest classifiers} construct many decision trees and pick the
|
||||
best-performing ones. Such classifiers are also called \emph{ensemble methods}.
|
||||
|
||||
\emph{Deep networks} started off as simple \emph{perceptrons} which were
|
||||
ineffective at solving the XOR-Problem. The conclusion was that there had to be
|
||||
additional hidden layers and back propagation to adjust the weights of the
|
||||
layers. It turned out that hidden layers are ineffective too, because back
|
||||
propagation would disproportionately affect later layers (\emph{vanishing
|
||||
gradients}). With \emph{convolutional neural networks} (CNNs) all that changed,
|
||||
because they combine automatic feature engineering with simple classification,
|
||||
processing on the GPU and effective training.
|
||||
|
||||
The \emph{radial basis function} is a simpler classifier which consists of one
|
||||
input layer, one hidden layer and one output layer. In the first layer we
|
||||
compare the input values to codebook vectors and employ a generalization of
|
||||
negative convolution. In the second layer the outputs from the first layer are
|
||||
multiplied by the weights from the hidden layer. This results in the
|
||||
aforementioned dual process model where negative convolution and positive
|
||||
convolution are employed to form the output.
|
||||
|
||||
\section{Evaluation 200 words}
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user