Reference no: EM132687457
BIG DATA AND DATA ANALYTICS
EXERCISE 1 [R-CODE]
Use R to create a variable called "posrating" in the dataframe. The variable takes on the value 1 if ratings >= 6.8. For ratings < 6.8 it takes on the value 0. Use R to perform a logistic regression that regresses the newly created variable "posrating" on aggregate_followers, dummy_sequel, netlikes, and sentiment in order to predict the probability that a movie has a positive rating. Interpret the coefficients and report the results of the logistic regression in APA style (including a logistic regression table and reporting of AIC-values).
EXERCISE 2 [R-CODE]
Use R to create a confusion matrix. Report the confusion matrix. Calculate and interpret sensitivity, specificity, and accuracy.
EXERCISE 3 [R-CODE]
Use R to create two logistic regression models. In the first model, regress posrating on sentiment. In the second model, regress posrating on budget, dummy_sequel, and sentiment. Plot an "Area under the ROC Curve" plot for each of the two models and explain the plots. Use the AUC values to compare the two models and interpret the results.
EXERCISE 4 [R-CODE]
Use the winsor function discussed in Week 3 to create a variable "sentiment_winsor" with a multiplier of 2.2. Use R to create a regression tree that uses budget and sentiment to predict ratings. Then, create a scatterplot of budget and sentiment, and add the partitions of the regression tree. Interpret at least 2 partions of the partitioned scatterplot.
EXERCISE 5 [R-CODE]
Based on the regression tree created in Exercise 4, use cross-validation to determine the optimal tree size and prune the tree. Plot and interpret the tree.
EXERCISE 6 [R-CODE]
Use R to create a classification tree to predict posrating. As predictors, take into account the variables aggregate_followers, comments, likes, dislikes, and sentiment. Use cross-validation to determine the optimal tree size and prune the tree. Plot and interpret the tree.
This lab project is based on a dataset about movie success in 2014 and 2015 by Ahmad et al. (2015) which is available on the online platform by Lichman et al (2013). Download the file movidata.csv from Blackboard and then practice the following topics in preparations for Lab Project 5.
PREPARATION
In preparation for Lab Project 5, load the moviedata.csv dataset. Use the dataset to practice the following topics:
- Logistic regressions: coefficient estimates, predicted probabilities, residuals, standard errors, confidence intervals, z-values, AIC, log likelihood
- Search for and watch online videos and blog entries about implementing a classifier (based on logistic regressions and classification trees; e.g., YouTube, R-Bloggers, Stackoverflow)
- Reporting of results in APA style
- Analysing and interpreting the results of logistic regressions
- Creating and interpreting a confusion matrix
- Sensitivity, specificity, and accuracy
- Area under the ROC
- Decision trees: regression trees and classification trees
- Pruning decision trees
- Analysing and interpreting the results of decision trees
Attachment:- Dataset_Description_and_Preparation.rar
Advantages and four disadvantages of outsourcing
: Outsourcing jobs is a technique used today in organizations as an Organizational Development approach
|
What is the process of management
: What is the process of management? What are the managerial roles?
|
Major challenges facing the global managerial worker
: Identify and briefly describe at least five of the major challenges facing the global managerial worker. Support your answer by using an additional source of in
|
Problem - Differential Analysis for Machine Replacement
: Problem - Differential Analysis for Machine Replacement - What are some of the other factors that should be considered before a final decision is made
|
Create a classification tree to predict posrating
: Create a classification tree to predict posrating. As predictors, take into account the variables aggregate_followers, comments, likes, dislikes, and sentiment.
|
Identify governmental and non-governmental regulatory
: Identify governmental and non-governmental regulatory bodies and their functions for nonprofit organizations
|
Prepare a differential analysis dated october
: Prepare a differential analysis, dated October 11, 2012, to determine whether the company should make (Alternative 1) or buy (Alternative 2) the carrying case
|
One model for structuring the corporate communications
: Describe and analyze at least one model for structuring the corporate communications function used by today's multinational organizations.
|
Estimate the contribution margin for each segment
: Segment analysis, Charles Schwab Corporation - Estimate the contribution margin for each segment, assuming depreciation represents the majority of fixed costs
|