Reference no: EM132212129
-Come up with one business problem (real or hypothetical) for each of the following types of data science solutions: classification, class probability estimation, clustering and association rule mining.
-As a concrete example, consider a set S of 14 people with eight of the non-write-off class and six of the write-off class. Based on the following table 1 of dataset, please answer the following questions.
Table 1. “Write-off example for supervised segmentation”
Name Balance Age Employed Write-off
Mike $40,000 35 Yes No
John $200,000 32 No Yes
Matt $60,000 53 No No
Mark $8,000 23 Yes Yes
Mary $100,000 43 No No
Andy $25,000 34 Yes Yes
Dora $39,000 18 Yes No
Robert $65,000 31 Yes No
Bob $8,200 27 Yes Yes
Captain $19,000 32 Yes No
Michael $72,000 43 Yes Yes
Howard $52,000 33 No Yes
King $105,000 36 No No
Peter $89,000 38 No No
a) We are trying to predict whether the person is a loan write-off. Could you describe what is the target variable in the above dataset table? What attributes can be used to predict the target variables?
b) What is the entropy of the dataset with respect to the target variable mentioned in the above question a)?
c) How much information gain can get after introducing the Employed attribute for the segmentation with respect to the above target variable mentioned in question a)? Is the attribute informative for segmenting the target variable?
d) If we categorize the “Balance” attribute into three types based on the following cutting point number $10,000 and $50,000, could you know how much information gain can get after introducing the Balance attribute for the segmentation with respect to the above target variable mentioned in question a)? Is the attribute informative for segmenting the target variable?
e) Please use a tee-induction model to visualize question d)’s segmentation result?