Reference no: EM133058123
Task Description
Instructions:
In the industry/real-world, you need to communicate either with your manager, client, other stakeholders and/or IT team to understand the source of data and to gather it.
Here, the teaching team already gathered data for you.
In this task, you need to perform experiments on TWO DATASETS.
1. The first dataset "NSL-KDD" can be obtained from the data folder, go to the "Week_5_NSL-KDD-Dataset" subfolder.
2. The second dataset is "Processed Combined IoT dataset"
Step 1: Predictive Modelling (Prediction of the classes)
Dataset 1:
The DecisionTreeClassifier has been implemented for you. Now, you need to implement other techniques and compare. Please do the following tasks:
1. Implement at least 5 benchmark classification algorithms.
2. Tune the parameters if applicable to obtain a good solution.
3. Obtain the confusion matrix for each of the scenarios (Use the test dataset).
4. Calculate the performance measures for the each of the classification algorithms that includes Precision (%), Recall (%), F-Score (%), False Alarm- FPR (%)
You need to compare the results following the table below. Create one table for each algorithm (Use the test dataset).
|
Attack Class
|
Precision (%)
|
Recall (%)...
|
...
|
...
|
...
|
...
|
|
DoS
|
|
|
|
|
|
|
|
Norm
al
|
|
|
|
|
|
|
|
Prob
|
|
|
|
|
|
|
|
R2L
|
|
|
|
|
|
|
|
U2R
|
|
|
|
|
|
|
Finally, you summarize the results similar to the below table (Use the test dataset).:
|
Algorithms
|
Accu
racy (%)
|
Precision (%)
|
Recall (%)...
|
...
|
...
|
...
|
...
|
|
Alg 1
|
|
|
|
|
|
|
|
|
Alg 2
|
|
|
|
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
...
|
|
|
|
|
|
|
|
Dataset 2:
A sample Random Forest implementation is given to you. Repeat the procedure as mentioned in datset 1. The only difference will be "you need to consider 70:30 train-test split (70% for train and 30% for test)" for testing as there is no separate test set file. Please note, k-fold cross validation is also acceptable. However, as k-fold cross validation will take a huge amount of time, we have not made it mandatory.
Comparison of Results:
Your results need to be comparable against benchmark algorithms. For example, see the below results obtained from a recent article "An Adaptive Ensemble Machine Learning
Model for Intrusion Detection" published in IEEE ACCESS, July 2019 for Dataset 1
For Dataset 2, please see the article "TON_IoT Telemetry Dataset: A New Generation Dataset of IoT and IIoT for Data-Driven Intrusion Detection Systems" for Dataset 2.
It will not be exactly same and nothing to be worried about that. Your target will be to select the best performing algorithms that you can and achieve a comparable results.
Step 2: Data Visualization
Perform the following tasks for both of the datasets:
1. Visualize and compare the accuracy of different algorithms.
2. Plottheconfusionmatrixforeachscenario.
Step 3: Results delivery:
Once you have completed the data analysis task for your security project, you need to deliver the outcome. (PLEASE NOTE, the results obtained from the above steps need to be submitted as a REPORT format rather than just a screenshot).
Here, you need to write a report (at least 3500 word)based on the outcome and results you obtained by performing the above steps. The report will describe the algorithms used, their working principle, key parameters, and the results. Results should consider all the key performance measures and comparative results in the form of tables, graphs, etc.
Attachment:- Predictive Modelling.rar
|
What is the npv of the acquisition
: RiverRocks, whose WACC is 11.9%, is considering an acquisition of Raft Adventures (whose WACC is 14.9%). The purchase will cost $100.3 million
|
|
Describe several reasons for studying finance
: 1. Identify and briefly describe several reasons for studying finance. (should be your own idea and not copied from the internet)
|
|
Prepare a reconciliation of the total of the list of balance
: Prepare a reconciliation of the total of the list of balances on the customers' personal accounts to the corrected balance on trade receivables control account
|
|
What is the npv of the project
: Problem 1 After extensive research, Riden Tires Inc. has recently developed a new tire, the SuperTread, and must decide whether to make the investment necessary
|
|
Implement benchmark classification algorithms
: Communicate either with your manager, client, other stakeholders and/or IT team to understand the source of data and to gather it.
|
|
What are the monthly payments
: The negotiated price of a car is 40,000$. Monthly payments are made at an APR of 7%. The residual value being 11,000. What are the monthly payments?
|
|
Prepare the journal entries on December
: On December 31, 2021, when its Allowance for Doubtful Accounts had a debit balance of $1,400, Prepare the journal entries on December
|
|
Long-term corporate bonds and earned
: You invested in long-term corporate bonds and earned 6.8 percent. During that same time period, large-company stocks returned 12.6 percent, long-term government
|
|
Prepare a multiple-step income statement for Meilleur
: Prepare a multiple-step income statement for Meilleur Merchants for the year ended December 31, 2021
|