Answer questions for ex2 part 1 and 2
This commit is contained in:
parent
fc192a2588
commit
ddc2e8daf7
@ -4,6 +4,30 @@
|
|||||||
|
|
||||||
Login via `ssh` to the Lab Environment and `cd working_directory`.
|
Login via `ssh` to the Lab Environment and `cd working_directory`.
|
||||||
|
|
||||||
|
>>>
|
||||||
|
Do you think that Go-Flows has any advantage compared with tcpdump?
|
||||||
|
>>>
|
||||||
|
|
||||||
|
Go-Flows has the advantage over tcpdump if a lot of customized options to
|
||||||
|
filter the traffic capture is needed. Other than that, tcpdump is usually already
|
||||||
|
known and easy to get started with. For simple filtering purposes I consider
|
||||||
|
tcpdump to be faster than Go-Flows.
|
||||||
|
|
||||||
|
>>>
|
||||||
|
What are the proportions of TCP, UDP, and ICMP traffic? And traffic that is not
|
||||||
|
TCP, UDP, or ICMP?
|
||||||
|
>>>
|
||||||
|
|
||||||
|
About half (~47%) of the capture is TCP traffic. ICMP traffic is about 40% and
|
||||||
|
UDP traffic about 7%. The rest of the traffic makes up about 6%.
|
||||||
|
|
||||||
|
>>>
|
||||||
|
How much traffic is related to websites (HTTP, HTTPS)? And DNS traffic?
|
||||||
|
>>>
|
||||||
|
HTTP traffic: ~14.12%
|
||||||
|
HTTPS traffic: ~15.25%
|
||||||
|
DNS traffic: ~00.82%
|
||||||
|
|
||||||
### rep-10
|
### rep-10
|
||||||
|
|
||||||
Run the following command inside `working_directory`:
|
Run the following command inside `working_directory`:
|
||||||
@ -27,7 +51,8 @@ After running the command
|
|||||||
|
|
||||||
we get the file `Ex2_team13.csv`.
|
we get the file `Ex2_team13.csv`.
|
||||||
|
|
||||||
The following python script quickly extracts the `protocolIdentifier` and their occurrences:
|
The following python script quickly extracts the `protocolIdentifier` and their
|
||||||
|
occurrences:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import numpy as np
|
import numpy as np
|
||||||
@ -52,6 +77,26 @@ Output:
|
|||||||
Name: protocolIdentifier, dtype: int64
|
Name: protocolIdentifier, dtype: int64
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## From Pcap to Flow Vectors
|
||||||
|
|
||||||
|
>>>
|
||||||
|
Remember that here we have extracted flows within a time-frame of 10 seconds.
|
||||||
|
Can you think about legitimate and illegitimate situations for case (c), i.e., a
|
||||||
|
source sending traffic to many different destinations in a short time?
|
||||||
|
>>>
|
||||||
|
|
||||||
|
TBA
|
||||||
|
|
||||||
|
>>>
|
||||||
|
You can additionally count the number of flows that show TCP, UDP, ICMP, and
|
||||||
|
other IP protocols as "mode" protocol. Do you think that you will get a similar
|
||||||
|
proportion as in [rep-11]? Beyond answering "yes" or "no", think about reasons
|
||||||
|
that might make such proportions similar or different (there are some that are
|
||||||
|
worth considering).
|
||||||
|
>>>
|
||||||
|
|
||||||
|
TBA
|
||||||
|
|
||||||
### rep-12
|
### rep-12
|
||||||
|
|
||||||
After running the command
|
After running the command
|
||||||
@ -60,8 +105,8 @@ After running the command
|
|||||||
|
|
||||||
we get the file `Ex2flows_team13.csv`.
|
we get the file `Ex2flows_team13.csv`.
|
||||||
|
|
||||||
The following python script quickly extracts the
|
The following python script quickly extracts the percentage of sources
|
||||||
percentage of sources communicating with one or more than ten destinations:
|
communicating with one or more than ten destinations:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import pandas as pd
|
import pandas as pd
|
||||||
@ -86,4 +131,28 @@ Output:
|
|||||||
Length of dataset: 209434
|
Length of dataset: 209434
|
||||||
Single Destination: 94.901 %
|
Single Destination: 94.901 %
|
||||||
More than 10 destinations: 0.796 %
|
More than 10 destinations: 0.796 %
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## From Pcap to Aggregated Vectors
|
||||||
|
|
||||||
|
>>>
|
||||||
|
It is obvious that the three explored time series have different
|
||||||
|
order-of-magnitude, but are they correlated? Time series must be plotted, so we
|
||||||
|
encourage you to do that. Depending on the analysis platform (Python, MATLAB, R,
|
||||||
|
etc.), you have commands that evaluate correlations between signals by outputting
|
||||||
|
a numerical value (0: no correlation, 1: maximum direct correlation, -1: maximum
|
||||||
|
inverse correlation). However, whenever possible, we recommend using plots and
|
||||||
|
visual representations. Plot the three time-series. To better assess
|
||||||
|
correlations, you can scale/normalize signals before plotting them.
|
||||||
|
>>>
|
||||||
|
|
||||||
|
TBA
|
||||||
|
|
||||||
|
>>>
|
||||||
|
Additionally, you can assess value distributions by plotting histograms. We
|
||||||
|
recommend also plotting central tendency values (mean, median, standard
|
||||||
|
deviation) superposed on the histograms to check if they are representative of
|
||||||
|
the data. Are they?
|
||||||
|
>>>
|
||||||
|
|
||||||
|
TBA
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user