Reference no: EM132278576
Assignment - Risk analytics and Big Data
Datawarehouse 2: freMTPL2freq, freMTPL2sev, risk features are collected for 677,991 motor third-party liability policies (observed in a year). freMTPL2freq contains the risk features and the claim number while freMTPL2sev contains the claim amount and the corresponding policy ID.
This assignment is to be structured report, using the headings: Introduction, Data, Methodology, Findings, Conclusion. You are required to:
1. Explore, analyse and summarise the data
2. Apply an appropriate (un-)supervised learning methodology
a. Unsupervised learning:
i. Train and Test Models
ii. Principal Component Analysis (PCA)
iii. Hard Clustering
iv. The K-Means Algorithm
v. Gaussian Mixture Model
b. Supervised learning
i. Predictive modeling algorithms:
ii. Classification techniques predict discrete responses
1. Regression techniques predict continuous responses
2. Binary vs. Multiclass Classification
3. Support Vector Machine (SVM)
4. k Nearest Neighbor (kNN)
5. Naïve Bayes
6. Discriminant Analysis
7. Decision Tree
8. Bagged and Boosted Decision Trees
9. Multiclass Support Vector Machines
iii. Cross Validation
iv. Creating Dummy Variables
v. Regression Methods
vi. After you run all the models on matlab, choose the one the most fit/appropriate model and explain why.
vii. While running these models explain if there is any significance of each one.
viii. After writing the code on each model and part explain what the code is used for. So please include all the steps.
3. Provide rationale for using this method, or show how various methods were applied prior to arriving at the one chosen.
4. Summarise and interpret findings
Report - Report should address the requirements with the results presented with the aid of explained graphical visualisations, and should include rationalisations of the approaches taken.
Matlab code should be included in the appendix.
Attachment:- Assignment Files.rar