Reference no: EM132328361
Assignment - Data Mining
These questions are based on exercises in the textbook at the end of Chapter 2. Use the following tables to answer the questions where appropriate. Tables are from our text, Data Mining for Business Analytics, 3 rd ., Shmueli et.al.
Table 2.7
Age
|
Income (S)
|
25
|
49,000
|
56
|
156,000
|
65
|
99,000
|
32
|
192,000
|
41
|
39,000
|
49
|
57,00:"
|
Table 2.5 SAMPLE FROM A DATABASE OF CREDIT APPLICATIONS
OBS |
CHECK ACCT |
DURATION |
HISTORY |
NEW |
USED |
FURNITURE |
RADIO |
EDUC |
RETRAIN |
AMOUNT |
SAVE |
RESPONSE |
CAR |
CAR |
TV |
ACCT |
1 |
0 |
6 |
4 |
0 |
0 |
0 |
1 |
0 |
0 |
1169 |
4 |
1 |
8 |
1 |
36 |
2 |
0 |
1 |
0 |
0 |
0 |
0 |
6948 |
0 |
1 |
16 |
0 |
24 |
2 |
0 |
0 |
0 |
1 |
0 |
0 |
1282 |
1 |
0 |
24 |
1 |
12 |
4 |
0 |
1 |
0 |
0 |
0 |
0 |
1804 |
1 |
1 |
32 |
0 |
24 |
2 |
0 |
0 |
1 |
0 |
0 |
0 |
4020 |
0 |
1 |
40 |
1 |
9 |
2 |
0 |
0 |
0 |
1 |
0 |
0 |
458 |
0 |
1 |
48 |
0 |
6 |
2 |
0 |
1 |
0 |
0 |
0 |
0 |
1352 |
2 |
1 |
56 |
3 |
6 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
783 |
4 |
1 |
64 |
1 |
48 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
14421 |
0 |
0 |
72 |
3 |
7 |
4 |
0 |
0 |
0 |
1 |
0 |
0 |
730 |
4 |
1 |
80 |
1 |
30 |
2 |
0 |
0 |
1 |
0 |
0 |
0 |
3832 |
0 |
1 |
88 |
1 |
36 |
2 |
0 |
0 |
0 |
0 |
1 |
0 |
12612 |
1 |
0 |
96 |
1 |
54 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
15945 |
0 |
0 |
104 |
1 |
9 |
4 |
0 |
0 |
1 |
0 |
0 |
0 |
1919 |
0 |
1 |
112 |
2 |
15 |
2 |
0 |
0 |
0 |
0 |
1 |
0 |
392 |
0 |
1 |
2.1 Assuming that data mining techniques are to be used in the following cases, identify whether the task required is supervised or unsupervised learning.
a. Deciding whether to issue a loan to an applicant based on demographic and financial data (with reference to a database of similar data on prior customers).
b. In an online bookstore, making recommendations to customers concerning additional items to buy based on the buying patterns in prior transactions.
c. Identifying a network data packet as dangerous (virus, hacker attack) based on comparison to other packets whose threat status is known.
d. Identify segments of similar customers.
e. Predicting whether a company will go bankrupt based on comparing its financial data to those of similar bankrupt and nonbankrupt firms.
f. Estimating the repair time required for an aircraft based on a trouble ticket.
g. Automated sorting of mail by zip code scanning.
h. Printing of custom discount coupons at the conclusion of a grocery store checkout based on what you just bought and what others have bought previously.
2.2 Describe the role of the validation partition and the test partition.
2.3 Look at the sample from a database of credit applications in Table 2.5 in the text. This table is listed in the exercises at the end of Chapter 2. Do you think it was randomly sampled? Is it a useful sample?
2.5 When a model is fit to training data with zero error what might be occurring? Why is this of concern?
2.8 Normalize the data in Table 2.7, showing all calculations. Be sure and look at homework hints in this week’s lecture.
2.10 Two models are applied to a partitioned dataset. Model A is much more accurate than model B on the training data, but less accurate than model B on the validation data. Which model should be considered for final deployment and why?