Reference no: EM133254129
Assignment
Designing and Building a Prediction Model for Bike Buyer Data with a Classifier Choose any two classifiers and apply to your Bike Buyer data Set
Language to be used: Python
Plan your experiment with:
1. Determine Data preprocessing methods required to apply for each of your classifiers
2. For each classifier,
2-1. Compare the accuracy of the classifier with two different sets of input parameters if applicable
2-2. Compare the accuracy of the classifier with two different data preprocessing methods.
2-3. Experiment for Feature Selection with PCA tools or Your Own Experiment (See Below for an example)
3. Compare the accuracy of each test of the classifiers
4. Discuss about your results:
- Why your inducted model is different for the same training data as you change the parameter values or the classifier.
- Why a certain parameter setting, or a classifier shows with better accuracy than the others that you tried.
- Anything you observed Dataset to be used:
- Use your data VTargetMail
Phases:
Phase 1. Determine Data preprocessing methods to apply for each of your classifiers. For example, Discretization for Decision Tree
- Vectorization of a record for SVM
- Normalization for Neural Network
Phase 2. Design your Data Analytic Experiment with Two different Classifiers of Your Choice. Choose any two different classifiers covered in class, for example, Decision Tree, Naïve Bayesian, SVM, Neural Network, K Nearest Neighbor, or any other classifier to compare the Accuracy of the results from your classifier.
Phase 2-1. Experiment to Find the Best Parameter Setting for your Classifier. For Example:
Example1:Decision Tree Classifier: C5 for GainRatioSplit, CART for GiniSplit on the same set of data with different parameter settings as follow:
- Measure: Entropy, GINI
- Different Minimum Support Thresholds
- Different Complex Penalty Degrees on the Number of Splits Example2: Neural Network:
Test with two Different Topologies: The number units of a hidden layer, The number of hidden layers SVM: Test with different Kernel functions
K Nearest Neighbor: Test with two different K values and distance metrics Or alternatively
Phase2-2. For Naïve Bayes, NN or SVM, Experiment with two different Data. Transformation Methods For Continuous and numeric Attributes,
1) Data set as floating point without Discretization and Binarization
2) Data set with Discretization and Binarization
Phase 2-3 Experiment for Feature Selection with either
1) Feature Significance Analysis with PCA tools
2) Your Own Experiment as follow:
Simple Experiment for Feature Selection Methodology
2-2-1. Pick the best parameter setting and data transformation from Phase 2.
2-2-2. Apply Your Classifier with the best parameters set to each different feature sets from your input file to see if there is any significant difference in the result for each iteration. (See Below for an example)
3. Validate your result with your Test Set to compare the Accuracy of your models for each classifier with different Parameter settings or different transformation method.
4. Discuss about your results:
- Why your inducted model is different for the same training data as you change the parameter values or the classifier.
- Why a classifier shows better accuracy than the others for a certain parameter setting or with a different transformation method.
- Any observations you made
Feature Significance Analysis with PCA tools
Simple Experiment for Feature Selection Methodology
1. Simple Experiment for Feature Selection Methodology to choose the best feature set: 1-1 Pick the best Model with the best parameter setting from Phase 1 and 2.
1-2 Apply your Model with the best parameters to different input sets (created with different combinations of feature sets from your VTargetMail input file to see if there are any significant differences in the result of each feature set in terms of Accuracy.
Attachment:- Building a Prediction Model.rar