Answer questions for ex2 part 1 and 2

2021-05-21 12:53:39 +02:00 · 2021-05-21 12:53:39 +02:00 · ddc2e8daf7
commit ddc2e8daf7
parent fc192a2588
1 changed files with 73 additions and 4 deletions
--- a/ex2/README.md
+++ b/ex2/README.md
@ -4,6 +4,30 @@
 Login via `ssh` to the Lab Environment and `cd working_directory`.
 >>>
 Do you think that Go-Flows has any advantage compared with tcpdump?
 >>>
 Go-Flows has the advantage over tcpdump if a lot of customized options to
 filter the traffic capture is needed. Other than that, tcpdump is usually already
 known and easy to get started with. For simple filtering purposes I consider
 tcpdump to be faster than Go-Flows.
 >>>
 What are the proportions of TCP, UDP, and ICMP traffic? And traffic that is not
 TCP, UDP, or ICMP?
 >>>
 About half (~47%) of the capture is TCP traffic. ICMP traffic is about 40% and
 UDP traffic about 7%. The rest of the traffic makes up about 6%.
 >>>
 How much traffic is related to websites (HTTP, HTTPS)? And DNS traffic?
 >>>
 HTTP traffic:  ~14.12%
 HTTPS traffic: ~15.25%
 DNS traffic:   ~00.82%
 ### rep-10
 Run the following command inside `working_directory`:
@ -27,7 +51,8 @@ After running the command
 we get the file `Ex2_team13.csv`.
-The following python script quickly extracts the `protocolIdentifier` and their occurrences:
+The following python script quickly extracts the `protocolIdentifier` and their
 occurrences:
 ```python
 import numpy as np
@ -52,6 +77,26 @@ Output:
 Name: protocolIdentifier, dtype: int64
 ```
 ## From Pcap to Flow Vectors
 >>>
 Remember that here we have extracted flows within a time-frame of 10 seconds.
 Can you think about legitimate and illegitimate situations for case (c), i.e., a
 source sending traffic to many different destinations in a short time?
 >>>
 TBA
 >>>
 You can additionally count the number of flows that show TCP, UDP, ICMP, and
 other IP protocols as "mode" protocol. Do you think that you will get a similar
 proportion as in [rep-11]? Beyond answering "yes" or "no", think about reasons
 that might make such proportions similar or different (there are some that are
 worth considering).
 >>>
 TBA
 ### rep-12
 After running the command
@ -60,8 +105,8 @@ After running the command
 we get the file `Ex2flows_team13.csv`.
-The following python script quickly extracts the 
+The following python script quickly extracts the percentage of sources
-percentage of sources communicating with one or more than ten destinations:
+communicating with one or more than ten destinations:
 ```python
 import pandas as pd
@ -86,4 +131,28 @@ Output:
 Length of dataset: 209434
 Single Destination: 94.901 %
 More than 10 destinations: 0.796 %
-```
+```
 ## From Pcap to Aggregated Vectors
 >>>
 It is obvious that the three explored time series have different
 order-of-magnitude, but are they correlated? Time series must be plotted, so we
 encourage you to do that. Depending on the analysis platform (Python, MATLAB, R,
 etc.), you have commands that evaluate correlations between signals by outputting
 a numerical value (0: no correlation, 1: maximum direct correlation, -1: maximum
 inverse correlation). However, whenever possible, we recommend using plots and
 visual representations. Plot the three time-series. To better assess
 correlations, you can scale/normalize signals before plotting them.
 >>>
 TBA
 >>>
 Additionally, you can assess value distributions by plotting histograms. We
 recommend also plotting central tendency values (mean, median, standard
 deviation) superposed on the histograms to check if they are representative of
 the data. Are they?
 >>>
 TBA