Reference no: EM132290605 , Length: word count:4000
Assignment Requirement
In this assignment, you work as a professional to write a report to your client. The report is on a decision tree analysis of a data set from your client. The assignment is quite open and the quality depends on what you do, how you do it, how you interpret the results, and whether the results are convincing.
You are asked to analyze the Wine Quality data set which you have used in a practical. The aim of the analysis is to show how wine quality is affected by other factors of the data set. You are asked to write a report of your analysis. Your report should cover the following areas. Length of each part should be within 5 pages.
1. Initial inspection of the dataset and discussion on how the observation may affect the result of the analysis.
In this part, you would include information about column types, value distributions, skewness, etc., and discuss possible impact of the observed properties on the classifiers to be leant.
You would also investigate the correlation of columns to see if some attributes are dependent, and indicate how the dependence affects the learnt classifier, and whether it is necessary to apply dimension reduction/feature selection.
2. Building a decision tree model using the SAS decision tree algorithm.
In this part, you need to try many major parameters in the properties on the left-hand side of the decision tree node. You then choose the top three trees to present their important parameter settings and their performance results including precision, recall, F-score and other performance indicators that you want to include. You then present the best tree, describe it, and interpret it. Interpretation includes major (affecting many tuples) split attributes and major decision groups, etc. Your interpretation must be in the language understandable by people who are not from the technical area.
Before presenting the results, you should also summarize how data is processed for this model, what features are used, and how data is partitioned.
3. Use data in a different way and then build another decision tree.
"Different way" may mean binning columns differently or using different features, or another aspect/consideration.
The building of the model and the presentation of the results are the same as Task 2.
4. Model comparison.
You compare the two models you obtained above. You draw conclusions from them. At the end, you describe what you learnt from this analysis.
Your report must have a cover, TOC, and an executive summary (1/2-3/4 of a page) in addition to the formal contents. The font needs to be Ebrima or Calibri of size 10, single spaced.
Attachment:- Assignment--wine Quality.zip