diff --git a/ex2/README.md b/ex2/README.md index a7f1a78..9dcbb45 100644 --- a/ex2/README.md +++ b/ex2/README.md @@ -4,6 +4,30 @@ Login via `ssh` to the Lab Environment and `cd working_directory`. +>>> +Do you think that Go-Flows has any advantage compared with tcpdump? +>>> + +Go-Flows has the advantage over tcpdump if a lot of customized options to +filter the traffic capture is needed. Other than that, tcpdump is usually already +known and easy to get started with. For simple filtering purposes I consider +tcpdump to be faster than Go-Flows. + +>>> +What are the proportions of TCP, UDP, and ICMP traffic? And traffic that is not +TCP, UDP, or ICMP? +>>> + +About half (~47%) of the capture is TCP traffic. ICMP traffic is about 40% and +UDP traffic about 7%. The rest of the traffic makes up about 6%. + +>>> +How much traffic is related to websites (HTTP, HTTPS)? And DNS traffic? +>>> +HTTP traffic: ~14.12% +HTTPS traffic: ~15.25% +DNS traffic: ~00.82% + ### rep-10 Run the following command inside `working_directory`: @@ -27,7 +51,8 @@ After running the command we get the file `Ex2_team13.csv`. -The following python script quickly extracts the `protocolIdentifier` and their occurrences: +The following python script quickly extracts the `protocolIdentifier` and their +occurrences: ```python import numpy as np @@ -52,6 +77,26 @@ Output: Name: protocolIdentifier, dtype: int64 ``` +## From Pcap to Flow Vectors + +>>> +Remember that here we have extracted flows within a time-frame of 10 seconds. +Can you think about legitimate and illegitimate situations for case (c), i.e., a +source sending traffic to many different destinations in a short time? +>>> + +TBA + +>>> +You can additionally count the number of flows that show TCP, UDP, ICMP, and +other IP protocols as "mode" protocol. Do you think that you will get a similar +proportion as in [rep-11]? Beyond answering "yes" or "no", think about reasons +that might make such proportions similar or different (there are some that are +worth considering). +>>> + +TBA + ### rep-12 After running the command @@ -60,8 +105,8 @@ After running the command we get the file `Ex2flows_team13.csv`. -The following python script quickly extracts the -percentage of sources communicating with one or more than ten destinations: +The following python script quickly extracts the percentage of sources +communicating with one or more than ten destinations: ```python import pandas as pd @@ -86,4 +131,28 @@ Output: Length of dataset: 209434 Single Destination: 94.901 % More than 10 destinations: 0.796 % -``` \ No newline at end of file +``` + +## From Pcap to Aggregated Vectors + +>>> +It is obvious that the three explored time series have different +order-of-magnitude, but are they correlated? Time series must be plotted, so we +encourage you to do that. Depending on the analysis platform (Python, MATLAB, R, +etc.), you have commands that evaluate correlations between signals by outputting +a numerical value (0: no correlation, 1: maximum direct correlation, -1: maximum +inverse correlation). However, whenever possible, we recommend using plots and +visual representations. Plot the three time-series. To better assess +correlations, you can scale/normalize signals before plotting them. +>>> + +TBA + +>>> +Additionally, you can assess value distributions by plotting histograms. We +recommend also plotting central tendency values (mean, median, standard +deviation) superposed on the histograms to check if they are representative of +the data. Are they? +>>> + +TBA