Reference no: EM132827551
NIT3202 Data Analytics for Cyber Security - Victoria university
In this assessment, you will apply supervised machine learning methods to classify Twitter spam (i have attached the twitter spam dataset) using the provided dataset. Table 1 shows the features description of the dataset
Follow instructions, complete all the tasks and organize your answers into an essay. R script, R screenshot, your results and explanations should be covered for each question.The answers should be written in the form of an essay based on the requirement of the assessment. e.g., introduction on the assessment, conclusion of the data exploration, and the detailed explanations on the task you have done.
Please save the all-r script file under my name "Noyanna NIT3202" so that it is visible in the screenshots, make sure there is screenshots for each steps showing the commands and the results and explain the tasks and output.
Here are your tasks:
1. Load dataset into R Studio, and randomly split the dataset to training dataset and testing dataset with the ratio of 9:1.
2. Use training dataset to train a machine learning model with the random forest algorithm for Twitter spam classification
3. Use testing dataset to test and evaluate the model trained in step 2 and print the confusion matrix.
4. Use training dataset to train another machine learning model with the K nearest neighbours algorithm.
5. Use testing data to test and evaluate the model trained in step 4 and print the confusion matrix.
6. Comparing the performance of Twitter spam classifiers established in step 2 and step 4, which algorithm can achieve better prediction results for this Twitter spam detection task? Why?
7. Change the ratio of training dataset and testing dataset to 8:2 and retrain random forest model and K nearest neighbours algorithm. Compare the performance with the classifiers established in step 2 and step 4. Which ratio can achieve better prediction results? Why?
Attachment:- Data analytics for cyber security.rar