Reference no: EM132403851
The topic of this assignment is malware classification using machine learning algorithm.
This is an open ended assignment. You are encouraged to pick any dataset as long as it isrelated to malware and use any machine learning method to perform classification or clustering on theselected dataset (see next section for more details).
You will need to do a 10 mins in-class presentations.
Tasks
In this assignment, you are required to conduct malware analysis using some machine learning methods. Tobe specific, you are supposed to finish the following tasks.
I, Pick up one data set from the data sets provided below. Or you can find some other related data set online.
II. If the data set you selected does not have a training set and a testing set, split the data randomly into training set and testing set, where training set takes 70% of the whole data set and the rest is testing set.
III. Select one machine learning method from Random Forest, Linear Regression, and Decision Tree.
You are encouraged to select other machine learning methods that are not included in class. For each method, conduct the following steps:
1). Define a model with a specific set of hyper-parameters;
2). Train the model using the selected model;
3). Evaluate the well-trained model using the training set and the testing set;
4). Change the choice of hyper-parameters and repeat step 1) - 3);
5). Select the set of hyper-parameters that have the best testing performance.
6). Extracted the feature importance and draw some figures.