Reference no: EM133114135
Describe Questions
1) Data Mining is:
A. Most applicable in large datasets
B. Discovering patterns and hidden trends in the data
C. Retrospective analyses of data
D. For providing accurate models and correct predictions
2) (T/F) Data Mining requires a good understanding of statistics and computer sciences
3) Data Mining relies on:
A. Cleaned and Curated data
B. Unstructured data
C. Computational efficiency of the algorithm
D. Training data
E. Non-experimental (Observational) data
4) The model selection process depends on several criteria including:
A. Hypothesis to be proved or disproved
B. Type of data available
C. Underlying methods such as association,etc.
D. All of the above
5) (T/F) Association mining typically requires you to identify strong rules for measures of minimum support and threshold.
6) Interestingness of patterns in a dataset can be determined by these methods
A. Correlation
B. Association Rules
C. Classification
D. Lift & Chi Square Test
7) (T/F) R2 is a measure of the explanatory power of the independent variables
8) (T/F) Model fit refers to how well the variables correlate with one another in a model
9) Sensitivity and Specificity are two values useful in:
A. Receiver Operating Characteristic curve
B. Sigmoid curve
C. Logit curve
D. Sinusoidal curve
E. None of the above
10) (T/F): Its best to compare and contrast model by using measures of information criteria AIC/BIC for individual and hybrid models.
11) Statistical inference refers to:
A. Predicting the outcome of a model run
B. Probability of an event occurrence
C. Measuring dependent variable and any error terms to arrive at a solution
D. None of the above
12) (T/F) Sample and Population in Statistics refers to how clean the dataset is before data modeling
13) The following technique is useful for a single descriptive measure of income by age
A. Variance
B. Central Tendency
C. Outliers
D. All of the above
14) (T/F)Probability theory is useful in statistics for improving upon ‘random guess' related to events occurring
15) Probability of joint occurrence refers to:
A. Two independent events
B. Co-occurring events
C. Conditionally independent events
D. Multiplying the probabilities of individual events
16) In the article: Advanced Scout - Data Mining and Knowledge Discovery in NBA Data
Describe the purpose of creating the data mining software (application) i.e. what value add does it bring?
17) In the article: Advanced Scout - Data Mining and Knowledge Discovery in NBA Data
Describe the 4 general steps used in the application as part of data mining - including possible data structure for the application to read the data from.
18) A few applications of Text Mining & NLP (Natural Language Processing) are:
A. Web reviews and ratings
B. Medical Records
C. Grading Exams
D. Social Media
19) Describe any Data Mining Application, and write a hypothesis statement for the problem.
20.) Focus on how to build features that are predictive.