Reference no: EM133316875
Questions
1. Why is EDA not enough for obtaining and implementing actionable insights?
A. EDA is not scalable.
B. Results and insights are hard to quantify.
Only A is correct
Only B is correct
Both A and B are correct
Neither is correct.
2. Which of the following is/are correct regarding Tidyverse packages?
A. Dplyr is useful for data wrangling.
B. Tidyr is useful for reshaping the data.
Only A is correct
Only B is correct
Both A and B are correct
Neither is correct.
3. Which of the following is/are advantages of RDS structure over csv?
A. It preserves column types.
B. It compresses the storage size of the data.
Only A is correct
Only B is correct
Both A and B are correct
Neither is correct.
4. It can be inferred from the lecture that linear regression is the dark horse of business analytics as business analysts are yet to realize its real potential.
True or False
5. What is a dependent variable in a regression problem
The primary key in the dataset
The variable which we mathematically model as a function of other variables
The variable that needs to be dummy coded.
6. What is an independent variable?
A. A variable on which the value of dependent variable is assumed to depend on
B. A predictor variable
Only A is correct
Only B is correct
Both A and B are correct.
Neither A not B is correct.
7. Which of these could be the goal(s) of creating a linear regression model? (one or more options may be correct/apply)
Explaining relationships
Making inferences
Classifying observations into one of two re more discrete categories
8. Which of the following is true regarding regression? (one or more options may be correct/apply)
Regression is like performing exploratory data analysis in multiple dimensions.
Regression predicts the average value of a dependent variable based on a linear combination of the independent variables.
It helps an analyst make statements with a quantifiable degree of confidence.
9. Suppose we want to include a categorical variable, product category, as an independent variable in a regression model for predicting revenue. There are 5 different values that appear in the product category column. What is the correct number of dummy variables that should be created to represent the product category column in the regression model?
10
5
4
1
10. Select the correct statement:
The correlation between variables X and Y is same as the correlation between Y and X.
The correlation between variables X and Y may or may not be the same as the correlation between variables Y and X, depending on the units of the variables X and Y.
11. Which of these is indicative that there is a significant relationship between an independent variable and a dependent variable in a simple linear regression? (one or more options may be correct/apply)
A large enough p-value
A large enough t value
A large enough R-squared
A large enough F-statistic
12. R-squared is the percent of variance in the Independent Variable explained by the Dependent Variable.
True
False