116 lines
4.8 KiB
Markdown
116 lines
4.8 KiB
Markdown
# Exercise 3
|
|
|
|
## Analyzing Darkspace Evolution
|
|
|
|
>>>
|
|
Check results from [rep-14] again. Are they correlated? Think for a second
|
|
about the possible meaning of the analyzed time series being correlated. What
|
|
could be the reason why the drop in the number of unique IP sources after Jan
|
|
16 does not cause a proportional drop in the other signals?
|
|
>>>
|
|
|
|
The results are mostly either strongly or somewhat correlated. Looking at the
|
|
different correlations, it could be that the drop happened because someone was
|
|
scanning the network or performing some kind of attack on a lot of different
|
|
hosts. This hypothesis is supported by the high correlation of unique
|
|
destination IPs with the amount of packets and the amount of bytes sent. It
|
|
follows that, since the unique source IPs dropped, one IP address had a lot of
|
|
outflow of traffic to a lot of unique destination IPs.
|
|
|
|
>>>
|
|
Check results from [rep-15] again. Do the results make sense for you? Would you
|
|
expect a different ratio in a normal network (no darkspace)?
|
|
>>>
|
|
|
|
In a normal network I would expect the ratio to be much closer to one, albeit
|
|
still higher than one. Thinking about my traffic at home, most requests have a
|
|
response associated with them and thus the ratio should be much closer to one.
|
|
This ratio is easily offset by doing a horizontal scan on the network for
|
|
example.
|
|
|
|
>>>
|
|
You used the median in [rep-15], but you could have used the mean. Does it make
|
|
any difference? What's better in your opinion? When to use mean and when
|
|
median? Can you figure out pros and cons for both measures of central tendency?
|
|
>>>
|
|
|
|
The median definitely makes more sense in this case since it has a strong
|
|
rejection of outliers. The traffic data is very diverse and spread out, meaning
|
|
that the mean would look very different from the median.
|
|
|
|
## Analyzing a Short Darkspace Period
|
|
|
|
>>>
|
|
Do values in Table A and Table B coincide? If not, why?
|
|
>>>
|
|
|
|
The values mostly coincide, except for the sums of course. This is to be
|
|
expected since both datasets are from the same timeframe. The standard deviation
|
|
of bytes is higher on the daily table, because there are probably times outside
|
|
of this particular month where a lot of bytes were sent, which causes the
|
|
standard deviation to be higher.
|
|
|
|
>>>
|
|
Histograms, but particularly box plots, corresponding to hourly counts might
|
|
differ from the equivalent histograms and box plots calculated with daily
|
|
averaged data. Do you know why? Can you find an explanation?
|
|
>>>
|
|
|
|
Having more fine-grained data with the hourly plots, also results in more
|
|
striking differences in the box plots especially. It takes usually less data to
|
|
elongate the whiskers of box plots because spikes in traffic are more
|
|
pronounced.
|
|
|
|
>>>
|
|
Make sure that you are familiar with the three main protocols appearing in the
|
|
`team13_protocol.csv` file. You should know their definition and what they are
|
|
used for.
|
|
>>>
|
|
|
|
The different protocols are:
|
|
|
|
* ICMP (Internet Control Message Protocol) with Identifier 1
|
|
* TCP (Transmission Control Protocol) with Identifier 6
|
|
* UDP (User Datagram Protocol) with Identifier 17
|
|
|
|
ICMP is mostly used for error reporting. Devices send ICMP packets for example
|
|
to make sure that a particular host is reachable or to alert the sending device
|
|
that a packet was too large for the receiver. ICMP can also be abused in DDoS
|
|
attacks where victims are flooded with packets or pinged to death.
|
|
|
|
TCP is the backbone of the internet as all HTTP(S) packets are sent over TCP. It
|
|
is connection-oriented as it establishes a session between client and server.
|
|
TCP is well-suited for applications that require packets to be sent in order and
|
|
where dropped packets are not wanted.
|
|
|
|
UDP is the opposite of TCP as it only operates connectionless. There is no
|
|
session between client and server established. Due to this property, it lends
|
|
itself well for applications such as VoIP, where data has to be sent quickly and
|
|
we do not care much about out-of-orderness or dropped packets.
|
|
|
|
>>>
|
|
Did you get negative values in [rep-19]? Can you figure out why? And why not in
|
|
the case of packets?
|
|
>>>
|
|
|
|
The negative values come from the fact that some source IPs appear multiple
|
|
times in different protocols (ICMP, TCP and UDP). The same goes for the
|
|
destination IPs. Adding those together gives a percentage higher than 100%.
|
|
Thus, the percentage of IPs _not_ belonging to these protocols must be smaller
|
|
than 0%. With packets it is not possible that they belong to multiple protocols
|
|
at once. Packets can only either be sent over ICMP, TCP or UDP.
|
|
|
|
## Analyzing Temporal Patterns
|
|
|
|
>>>
|
|
Do signals in [rep-20] show periodicities?
|
|
>>>
|
|
|
|
Yes, especially the number of unique TCP source IPs shows a strong diurnal
|
|
pattern. It increases drastically every day from after midnight until the
|
|
evening where it sharply drops off until midnight again. There is also a
|
|
distinct drop during lunchtime.
|
|
|
|
The number of TCP packets, on the other hand, does not seem to have any obvious
|
|
periodicities.
|