Reference no: EM133044691
Task 1: Create and explore Weka data file of type ARFF
Download a text file called data.csv from the subject site (Canvas) and open it using a text editor such as WordPad, Notepad++ etc., for windows system or Textedit for Mac. You need to explore and convert this file into an ARFF file for Weka. The text file you will be using contains a sample of real-life data related to customers. The data.csv file is not entirely formatted as a Weka file (ARFF). This file has some formatting errors, and your task is to find these errors and fix them to have a valid ARFF file. Save the valid file as data.arff.
Explore the data.arff dataset using Weka Explorer and answer the following questions. Make sure to include screenshots of the visualisations to support your answers.
1. Take a screenshot of your corrected ARFF file.
2. Which attribute in the dataset do you think is useless and did not provide useful information for prediction?
3. How many attributes the dataset has?
4. How many instances the dataset has?
S. What is the class attribute in the data.arff dataset?
6. What proportion of customers who has a mortgage and living in Inner City?
7. What proportion of customers who has a mortgage and living in Inner City?
8. What proportion of customers who has a mortgage and their income is between $8000 and $29000?
9. How many customers are married and has no mortgage?
10. How many customers have not owned a car and has a mortgage?
Task 2: Practical Analysis
Use the dataset from Task 1 to perform data mining tasks for Task 2 and compare the performance on this data set for the following classification algorithms using classification algorithms:
- Naive Bayes
- HoeffdingTree
- SVM ( or SMO)
- 148
Write a summary report that compares the performance of these algorithms. Make sure to comment on these algorithms performance and accuracy using the performance metrics shown in the classifier output, such as the confusion matrix, etc. In your report, you need to state if there is a difference in the performance between these algorithms and which algorithm performs best. Make sure to include the necessary tables, graphs, screenshots etc., to make your report understandable to the person who reads it.
Attachment:- data file.rar