..
2021-05-14 19:14:49 +02:00
2021-05-17 12:19:42 +02:00
2021-05-21 14:37:00 +02:00
2021-05-14 19:08:09 +02:00
2021-05-14 19:08:09 +02:00
2021-05-14 19:08:09 +02:00
2021-05-14 20:14:52 +02:00
2021-05-21 14:29:48 +02:00
2021-05-17 12:19:42 +02:00
2021-05-10 14:00:19 +02:00

Exercise 3

Analyzing Darkspace Evolution

Check results from [rep-14] again. Are they correlated? Think for a second about the possible meaning of the analyzed time series being correlated. What could be the reason why the drop in the number of unique IP sources after Jan 16 does not cause a proportional drop in the other signals?

The results are mostly either strongly or somewhat correlated. Looking at the different correlations, it could be that the drop happened because someone was scanning the network or performing some kind of attack on a lot of different hosts. This hypothesis is supported by the high correlation of unique destination IPs with the amount of packets and the amount of bytes sent. It follows that, since the unique source IPs dropped, one IP address had a lot of outflow of traffic to a lot of unique destination IPs.

Check results from [rep-15] again. Do the results make sense for you? Would you expect a different ratio in a normal network (no darkspace)?

In a normal network I would expect the ratio to be much closer to one, albeit still higher than one. Thinking about my traffic at home, most requests have a response associated with them and thus the ratio should be much closer to one. This ratio is easily offset by doing a horizontal scan on the network for example.

You used the median in [rep-15], but you could have used the mean. Does it make any difference? What's better in your opinion? When to use mean and when median? Can you figure out pros and cons for both measures of central tendency?

The median definitely makes more sense in this case since it has a strong rejection of outliers. The traffic data is very diverse and spread out, meaning that the mean would look very different from the median.

Analyzing a Short Darkspace Period

Do values in Table A and Table B coincide? If not, why?

The values mostly coincide, except for the sums of course. This is to be expected since both datasets are from the same timeframe. The standard deviation of bytes is higher on the daily table, because there are probably times outside of this particular month where a lot of bytes were sent, which causes the standard deviation to be higher.

Histograms, but particularly box plots, corresponding to hourly counts might differ from the equivalent histograms and box plots calculated with daily averaged data. Do you know why? Can you find an explanation?

Having more fine-grained data with the hourly plots, also results in more striking differences in the box plots especially. It takes usually less data to elongate the whiskers of box plots because spikes in traffic are more pronounced.

Make sure that you are familiar with the three main protocols appearing in the team13_protocol.csv file. You should know their definition and what they are used for.

The different protocols are:

  • ICMP (Internet Control Message Protocol) with Identifier 1
  • TCP (Transmission Control Protocol) with Identifier 6
  • UDP (User Datagram Protocol) with Identifier 17

ICMP is mostly used for error reporting. Devices send ICMP packets for example to make sure that a particular host is reachable or to alert the sending device that a packet was too large for the receiver. ICMP can also be abused in DDoS attacks where victims are flooded with packets or pinged to death.

TCP is the backbone of the internet as all HTTP(S) packets are sent over TCP. It is connection-oriented as it establishes a session between client and server. TCP is well-suited for applications that require packets to be sent in order and where dropped packets are not wanted.

UDP is the opposite of TCP as it only operates connectionless. There is no session between client and server established. Due to this property, it lends itself well for applications such as VoIP, where data has to be sent quickly and we do not care much about out-of-orderness or dropped packets.

Did you get negative values in [rep-19]? Can you figure out why? And why not in the case of packets?

The negative values come from the fact that some source IPs appear multiple times in different protocols (ICMP, TCP and UDP). The same goes for the destination IPs. Adding those together gives a percentage higher than 100%. Thus, the percentage of IPs not belonging to these protocols must be smaller than 0%. With packets it is not possible that they belong to multiple protocols at once. Packets can only either be sent over ICMP, TCP or UDP.