Reference no: EM133003268
MITS5509 Intelligent Systems for Analytics
Carefully read the following two questions and provide the appropriate answer.
Question 1:
The bankruptcy-prediction problem can be viewed as a problem of classification. The data set you will be using for this problem includes one ratio that have been computed from the financial statements of real-world firms. These ratios have been used in studies involving bankruptcy prediction. The first sample (training set) includes 68 data value on firms that went bankrupt and firms that did not. This will be your training sample. The second sample (testing set) of 68 firms also consists of some bankrupt firms and some non-bankrupt firms. Your goal is to use different classifiers to build a training model, by randomly selecting the 40 data points (20 points from category 1 and 20 points from category 0), and then test its performance on the testing model by randomly selecting 40 data points from the testing set. (Try to analyze the new cases yourself manually before you run the neural network and see how well you do.)
Students must use the following classifiers. The selection of the classifiers depends upon the members of the group, e.g. if the group has four members then they will use the four classifiers from the following five classifiers.
1. Neural network
2. Support vector machine
3. Nearest neighbor algorithm
4. Decision tree
5. Naive Bayes
The following tables show the training sample and test data you should use for this major assignment.
Training Sample Data
Firm
|
WC
|
Category
|
1
|
309.577
|
1
|
2
|
363.79
|
1
|
3
|
341.399
|
1
|
4
|
363.616
|
1
|
5
|
323.673
|
1
|
6
|
323.353
|
1
|
7
|
350.371
|
1
|
8
|
240.602
|
1
|
9
|
220.057
|
1
|
10
|
287.837
|
1
|
11
|
274.6
|
1
|
12
|
278.494
|
1
|
13
|
234.267
|
1
|
14
|
284.923
|
1
|
15
|
190.62
|
1
|
16
|
327.76
|
1
|
17
|
211.94
|
1
|
18
|
373.571
|
1
|
19
|
219.891
|
1
|
20
|
193.489
|
1
|
21
|
204.333
|
1
|
22
|
205.657
|
1
|
23
|
362.361
|
1
|
24
|
285.562
|
1
|
25
|
352.649
|
1
|
26
|
400.44
|
1
|
27
|
307.301
|
1
|
28
|
240.314
|
1
|
29
|
322.995
|
1
|
30
|
408.197
|
1
|
31
|
209.027
|
1
|
32
|
198.979
|
1
|
33
|
340.418
|
1
|
34
|
320.154
|
1
|
35
|
189.826
|
0
|
36
|
651.65
|
0
|
37
|
487.494
|
0
|
38
|
254.899
|
0
|
39
|
575.646
|
0
|
40
|
160.712
|
0
|
41
|
269.729
|
0
|
42
|
513.301
|
0
|
43
|
1996.866
|
0
|
44
|
683.512
|
0
|
45
|
377.246
|
0
|
46
|
289.579
|
0
|
47
|
171.851
|
0
|
48
|
205.39
|
0
|
49
|
203.593
|
0
|
50
|
365.159
|
0
|
51
|
266.962
|
0
|
52
|
461.943
|
0
|
53
|
215.392
|
0
|
54
|
235.794
|
0
|
55
|
881.477
|
0
|
56
|
463.897
|
0
|
57
|
475.693
|
0
|
58
|
540.01
|
0
|
59
|
612.817
|
0
|
60
|
140.277
|
0
|
61
|
396.541
|
0
|
62
|
271.185
|
0
|
63
|
507.039
|
0
|
64
|
733.641
|
0
|
65
|
612.455
|
0
|
66
|
499.495
|
0
|
67
|
290.715
|
0
|
68
|
171.447
|
0
|
Testing Sample Data
Firm
|
WC
|
1
|
367.325
|
2
|
347.513
|
3
|
330.226
|
4
|
178.106
|
5
|
378.899
|
6
|
257.212
|
7
|
333.088
|
8
|
182.324
|
9
|
238.099
|
10
|
329.643
|
11
|
294.644
|
12
|
281.666
|
13
|
308.086
|
14
|
317.079
|
15
|
245.139
|
16
|
354.662
|
17
|
292.256
|
18
|
306.79
|
19
|
222.396
|
20
|
367.628
|
21
|
342.115
|
22
|
353.326
|
23
|
336.39
|
24
|
298.008
|
25
|
266.396
|
26
|
243.554
|
27
|
172.184
|
28
|
362.479
|
29
|
249.981
|
30
|
327.877
|
31
|
286.696
|
32
|
182.762
|
33
|
338.347
|
34
|
302.57
|
35
|
299.651
|
36
|
247.595
|
37
|
339.311
|
38
|
366.139
|
39
|
398.295
|
40
|
205.129
|
41
|
371.419
|
42
|
175.406
|
43
|
476.159
|
44
|
359.144
|
45
|
315.97
|
46
|
329.629
|
47
|
399.552
|
48
|
442.799
|
49
|
255.405
|
50
|
408.036
|
51
|
497.195
|
52
|
249.674
|
53
|
292.026
|
54
|
481.193
|
55
|
394.76
|
56
|
273.175
|
57
|
311.517
|
58
|
238.067
|
59
|
292.459
|
60
|
2010.227
|
61
|
637.604
|
62
|
379.869
|
63
|
268.318
|
64
|
416.08
|
65
|
377.011
|
66
|
355.757
|
67
|
319.223
|
68
|
240.423
|
From the above data set, the group has to prepare a report which include the followings:
1. Explain the process of building each classifier using the training set (add the screenshots).
2. Explain how did you evaluate the classifier.
3. Create the confusion matrix based on 70% (training) / 30% (testing).
4. Predict the category of the values (any random 40 values) in table used for Testing set.
5. Compare the results between the different classifiers and discuss which one is the best and why.
Note: Students can use any open source free data mining software such as Python, Statistica Data Miner, Weka,RapidMiner, KNIME and MATLAB etc.
Question 2:
Create a DASHBOARD. For creating a dashboard, the group can use the above database or any other database. The group has to prepare a report which include the followings:
1. Write an introduction about the dataset used and add the reference (link).
2. Create at least four figures (different graphs) and add them to dashboard.
3. Add Screenshot of each of the steps.
4. Describe the figures in the dashboard.
The student can use any software to create the dashboard such as Microsoft excel, Power BI, Tableau, etc.
The above list of documents is not necessarily in any order. The chronological order we cover these topics in lectures is not meant to dictate the order in which you collate these into one coherent document for your assignment.
Your report must include a Title Page with the title of the Assignment and the name and ID numbers of all group members. A contents page showing page numbers and titles of all major sections of the report. All Figures included must have captions and Figure numbers and be referenced within the document.
Captions for figures placed below the figure, captions for tables placed above the table. Include a footer with the page number. Your report should use 1.5 spacing with a 12 point Times New Roman font.
Include references where appropriate. Citation of sources (if using any) is mandatory and must be in the APA style.
Attachment:- Intelligent Systems for Analytics.rar