Reference no: EM133078546
1. The Naïve Bayes classifier is the best choice of classifier when you want accurate probability estimates. True or False?
2. The "bag of words" method of data mining text considers each document as a collection of words without regard for word sequence or context. True or False?
3. Which of the following more advanced methods of data mining text does NOT treat each document as a collection of individual, unrelated words?
A) N-gram sequences
B) Named entity extraction
C) Topic models
D) All of the above
4. The strength of an association measures how often we see than association in a data set. True or False?
5. Suppose you run a business where you need a model to make very fast classifications, as well as a model that can be updated quickly as new training cases come along. Which model would best meet these requirements?
A) Naïve Bayes
B) Tree induction
C) k-NN
D) Logistic regression
6. Which of the following modeling approaches would be the best choice if we are trying to understand the typical behavior of a customer in our brick-and-mortar retail store?
A) Co-occurrence grouping
B) Profiling
C) Link prediction
D) Latent information discovery
7. Steam, a popular online platform for purchasing and playing computer video games, has a "Featured & Recommended" section on its home page. It is customized for each user and shows games on the platform available for purchase that the user does not yet own. The creation of this section of the home page is an example of:
A) Co-occurrence grouping
B) Text-based data mining
C) Profiling
D) Link prediction
8. Data reduction methods can help uncover underlying latent information that is common to groups of of your attributes. True or False?
9. Ensemble methods combine the results of several models and leverage the fact that multiple models are better on average than any one model. True or False?
10. Which of the is true concerning the application of ensemble methods to classification decision trees?
A) Bagging and boosting methods both involve creating models sequentially
B) Bagging and random forests methods reduce the variance of predictions at the expense of increasing bias
C) Random forest methods for creating decision trees use the entire set of attributes available for each node when calculating information gain
D) Bagging methods do not involve bootstrapping and splitting the training data set while random forests do