Answer questions for ex2 part 1 and 2

2021-05-21 12:53:39 +02:00 · 2021-05-21 12:53:39 +02:00 · ddc2e8daf7
commit ddc2e8daf7
parent fc192a2588
1 changed files with 73 additions and 4 deletions
--- a/ex2/README.md
+++ b/ex2/README.md
@ -4,6 +4,30 @@

 Login via `ssh` to the Lab Environment and `cd working_directory`.

+>>>
+Do you think that Go-Flows has any advantage compared with tcpdump?
+>>>
+
+Go-Flows has the advantage over tcpdump if a lot of customized options to
+filter the traffic capture is needed. Other than that, tcpdump is usually already
+known and easy to get started with. For simple filtering purposes I consider
+tcpdump to be faster than Go-Flows.
+
+>>>
+What are the proportions of TCP, UDP, and ICMP traffic? And traffic that is not
+TCP, UDP, or ICMP?
+>>>
+
+About half (~47%) of the capture is TCP traffic. ICMP traffic is about 40% and
+UDP traffic about 7%. The rest of the traffic makes up about 6%.
+
+>>>
+How much traffic is related to websites (HTTP, HTTPS)? And DNS traffic?
+>>>
+HTTP traffic:  ~14.12%
+HTTPS traffic: ~15.25%
+DNS traffic:   ~00.82%
+
 ### rep-10

 Run the following command inside `working_directory`:
@ -27,7 +51,8 @@ After running the command

 we get the file `Ex2_team13.csv`.

-The following python script quickly extracts the `protocolIdentifier` and their occurrences:
+The following python script quickly extracts the `protocolIdentifier` and their
+occurrences:

 ```python
 import numpy as np
@ -52,6 +77,26 @@ Output:
 Name: protocolIdentifier, dtype: int64
 ```

+## From Pcap to Flow Vectors
+
+>>>
+Remember that here we have extracted flows within a time-frame of 10 seconds.
+Can you think about legitimate and illegitimate situations for case (c), i.e., a
+source sending traffic to many different destinations in a short time?
+>>>
+
+TBA
+
+>>>
+You can additionally count the number of flows that show TCP, UDP, ICMP, and
+other IP protocols as "mode" protocol. Do you think that you will get a similar
+proportion as in [rep-11]? Beyond answering "yes" or "no", think about reasons
+that might make such proportions similar or different (there are some that are
+worth considering).
+>>>
+
+TBA
+
 ### rep-12

 After running the command
@ -60,8 +105,8 @@ After running the command

 we get the file `Ex2flows_team13.csv`.

-The following python script quickly extracts the 
-percentage of sources communicating with one or more than ten destinations:
+The following python script quickly extracts the percentage of sources
+communicating with one or more than ten destinations:

 ```python
 import pandas as pd
@ -86,4 +131,28 @@ Output:
 Length of dataset: 209434
 Single Destination: 94.901 %
 More than 10 destinations: 0.796 %
-```
+```
+
+## From Pcap to Aggregated Vectors
+
+>>>
+It is obvious that the three explored time series have different
+order-of-magnitude, but are they correlated? Time series must be plotted, so we
+encourage you to do that. Depending on the analysis platform (Python, MATLAB, R,
+etc.), you have commands that evaluate correlations between signals by outputting
+a numerical value (0: no correlation, 1: maximum direct correlation, -1: maximum
+inverse correlation). However, whenever possible, we recommend using plots and
+visual representations. Plot the three time-series. To better assess
+correlations, you can scale/normalize signals before plotting them.
+>>>
+
+TBA
+
+>>>
+Additionally, you can assess value distributions by plotting histograms. We
+recommend also plotting central tendency values (mean, median, standard
+deviation) superposed on the histograms to check if they are representative of
+the data. Are they?
+>>>
+
+TBA