Reference no: EM133192703
Question 1.
Discuss the reasons behind Data Analysis and Data Mining becoming more and more popular (almost to a degree of being a requirement for any mid/large size businesses). Give at least 3 reasons and explain them (please use numbering for your 3 reasons):
Question 2.
Assume, two attributes have a correlation of 0.02; what does this tell you about the relationship of the two attributes? Answer the same question assuming the correlation is -0.98.
Question 3.
Give the definitions of
Training set and Test set:
Also, Explain the functionality of each one:
Question 4.
What is overfitting? Why is it so problematic for Decision Tree Induction? How to address overfitting?
Question 5.
Given two models of classification
- Model M1: accuracy = 85%, tested on 30 instances
- Model M2: accuracy = 75%, tested on 5000 instances
What test would help to find which model is better?
a. Test of Reliability
b. Test of Accuracy
c. Test of Model Fitness
d. Test of Significance
Attachment:- Training set and Test set.rar