Add Classification section
This commit is contained in:
parent
3f6e543441
commit
65c0907685
96
sim.tex
96
sim.tex
@ -156,7 +156,7 @@ applications, color is often encoded using \emph{YCrCb}, where \emph{Y}
|
|||||||
represents lightness and \emph{Cr} and \emph{Cb} represent $Y-R$ and $Y-B$
|
represents lightness and \emph{Cr} and \emph{Cb} represent $Y-R$ and $Y-B$
|
||||||
respectively. To find a dominant color within an image, we can choose to only
|
respectively. To find a dominant color within an image, we can choose to only
|
||||||
look at certain sections of the frame, e.g. the center or the largest continuous
|
look at certain sections of the frame, e.g. the center or the largest continuous
|
||||||
region of color. Another approach is to use a color histogram to count the
|
region of color. Another approach is to use a color histogram to count the
|
||||||
number of different hues within the frame.
|
number of different hues within the frame.
|
||||||
|
|
||||||
Recognizing objects by their texture can be divided into three different
|
Recognizing objects by their texture can be divided into three different
|
||||||
@ -197,7 +197,99 @@ measure is calculated from two frames and if the result exceeds a threshold,
|
|||||||
there is movement. The similarity measurements can be aggregated to provide a
|
there is movement. The similarity measurements can be aggregated to provide a
|
||||||
robust detection of camera movement.
|
robust detection of camera movement.
|
||||||
|
|
||||||
\section{Classification 500 words}
|
\section{Classification}
|
||||||
|
|
||||||
|
The setting for classification is described by taking a feature space and
|
||||||
|
clustering the samples within that feature space. The smaller and well-defined
|
||||||
|
the clusters are, the better the classification works. At the same time we want
|
||||||
|
to have a high covariance between clusters so that different classes are easily
|
||||||
|
distinguishable. Classification is another filtering method which reduces the
|
||||||
|
input data—sometimes on the order of millions of dimensions—into simple
|
||||||
|
predicates, e.g. \emph{yes} or \emph{no} instances. The goal of classification
|
||||||
|
is therefore that semantic enrichment comes along with the filtering process.
|
||||||
|
|
||||||
|
The two fundamental methods used in classification are \emph{separation} and
|
||||||
|
\emph{hedging}. Separation tries to draw a line between different classes in the
|
||||||
|
feature space. Hedging, on the other hand, uses perimeters to cluster samples.
|
||||||
|
Additionally, the centroid of each cluster is calculated and the covariance
|
||||||
|
between two centroids acts as a measure of separation. Both methods can be
|
||||||
|
linked to \emph{concept theories} such as the \emph{classical} and
|
||||||
|
\emph{prototype} theory. While concept theory classifies different things based
|
||||||
|
on their necessary and sufficient conditions, prototype theory uses typical
|
||||||
|
examples to come to a conclusion about a particular thing. The first can be
|
||||||
|
mapped to the fundamental method of separation in machine learning, whereas the
|
||||||
|
latter is mapped to the method of hedging. In the big picture, hedging is
|
||||||
|
remarkably similar to negative convolution, as discussed earlier. Separation,
|
||||||
|
on the other hand, has parallels with positive convolution.
|
||||||
|
|
||||||
|
If we take separation as an example, there are multiple ways how we can split
|
||||||
|
classes using a simple line. One could draw a straight line between two classes
|
||||||
|
without caring about individual samples, which are then misclassified. This
|
||||||
|
often results in so-called \emph{underfitting}, because the classifier would not
|
||||||
|
work well on a dataset which it has not seen before. Conversely, if the line
|
||||||
|
includes too many individual samples and is a function of high degree, the
|
||||||
|
classifier is likely \emph{overfitting}. Both, underfitting and overfitting, are
|
||||||
|
common pitfalls to avoid as the best classifier lies somewhere in-between the
|
||||||
|
two. To be able to properly train, test and validate a classifier, the test data
|
||||||
|
are split into these three different categories.
|
||||||
|
|
||||||
|
\emph{Unsupervised classification} or \emph{clustering} employs either a
|
||||||
|
bottom-up or top-down approach. Regardless of the chosen method, unsupervised
|
||||||
|
classification works with unlabeled data. The goal is to construct a
|
||||||
|
\emph{dendrogram} which consists of distance measures between the samples and
|
||||||
|
their centroids with different samples. In the bottom-up approach an individual
|
||||||
|
sample marks a leaf of the tree-like dendrogram and is connected through a
|
||||||
|
negative convolution measurement to neighboring samples. In the top-down
|
||||||
|
approach the dendrogram is not built from the leaves, but by starting from the
|
||||||
|
centroid of the entire feature space. Distance measurements to samples within
|
||||||
|
the field recursively construct the dendrogram until all samples are included.
|
||||||
|
|
||||||
|
One method of \emph{supervised classification} is the \emph{vector space model}.
|
||||||
|
It is well-suited for finding items which are similar to a given item (= the
|
||||||
|
query or hedge). Usually, a simple distance measurement such as the euclidian
|
||||||
|
distance provides results which are good enough, especially for online shops
|
||||||
|
where there are millions of products on offer and a more sophisticated approach
|
||||||
|
is too costly.
|
||||||
|
|
||||||
|
Another method is \emph{k-nearest-neighbors}, which requires ground truth data.
|
||||||
|
Here, a new sample is classified by calculating the distance to all neighbors in
|
||||||
|
a given diameter. The new datum is added to the cluster which contains the
|
||||||
|
closest samples.
|
||||||
|
|
||||||
|
\emph{K-means} requires information about the centroids of the individual
|
||||||
|
clusters. Distance measurements to the centroids determine to which cluster the
|
||||||
|
new sample belongs to.
|
||||||
|
|
||||||
|
\emph{Self-organizing maps} are similar to k-means, but with two changes. First,
|
||||||
|
all data outside of the area of interest is ignored. Second, after a winning
|
||||||
|
cluster is found, it is moved closer to the query object. The process is
|
||||||
|
repeated for all other clusters. This second variation on k-means constitutes
|
||||||
|
the first application of the concept of \emph{learning}.
|
||||||
|
|
||||||
|
\emph{Decision trees} divide the feature space into arbitrarily-sized regions.
|
||||||
|
Multiple regions define a particular class. This method is in practice highly
|
||||||
|
prone to overfitting, which is why they are combined to form a random forest
|
||||||
|
classifier.
|
||||||
|
|
||||||
|
\emph{Random forest classifiers} construct many decision trees and pick the
|
||||||
|
best-performing ones. Such classifiers are also called \emph{ensemble methods}.
|
||||||
|
|
||||||
|
\emph{Deep networks} started off as simple \emph{perceptrons} which were
|
||||||
|
ineffective at solving the XOR-Problem. The conclusion was that there had to be
|
||||||
|
additional hidden layers and back propagation to adjust the weights of the
|
||||||
|
layers. It turned out that hidden layers are ineffective too, because back
|
||||||
|
propagation would disproportionately affect later layers (\emph{vanishing
|
||||||
|
gradients}). With \emph{convolutional neural networks} (CNNs) all that changed,
|
||||||
|
because they combine automatic feature engineering with simple classification,
|
||||||
|
processing on the GPU and effective training.
|
||||||
|
|
||||||
|
The \emph{radial basis function} is a simpler classifier which consists of one
|
||||||
|
input layer, one hidden layer and one output layer. In the first layer we
|
||||||
|
compare the input values to codebook vectors and employ a generalization of
|
||||||
|
negative convolution. In the second layer the outputs from the first layer are
|
||||||
|
multiplied by the weights from the hidden layer. This results in the
|
||||||
|
aforementioned dual process model where negative convolution and positive
|
||||||
|
convolution are employed to form the output.
|
||||||
|
|
||||||
\section{Evaluation 200 words}
|
\section{Evaluation 200 words}
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user