Create a classification tree to predict posrating

Assignment Help Other Subject
Reference no: EM132687457

BIG DATA AND DATA ANALYTICS

EXERCISE 1 [R-CODE]
Use R to create a variable called "posrating" in the dataframe. The variable takes on the value 1 if ratings >= 6.8. For ratings < 6.8 it takes on the value 0. Use R to perform a logistic regression that regresses the newly created variable "posrating" on aggregate_followers, dummy_sequel, netlikes, and sentiment in order to predict the probability that a movie has a positive rating. Interpret the coefficients and report the results of the logistic regression in APA style (including a logistic regression table and reporting of AIC-values).

EXERCISE 2 [R-CODE]
Use R to create a confusion matrix. Report the confusion matrix. Calculate and interpret sensitivity, specificity, and accuracy.

EXERCISE 3 [R-CODE]
Use R to create two logistic regression models. In the first model, regress posrating on sentiment. In the second model, regress posrating on budget, dummy_sequel, and sentiment. Plot an "Area under the ROC Curve" plot for each of the two models and explain the plots. Use the AUC values to compare the two models and interpret the results.

EXERCISE 4 [R-CODE]
Use the winsor function discussed in Week 3 to create a variable "sentiment_winsor" with a multiplier of 2.2. Use R to create a regression tree that uses budget and sentiment to predict ratings. Then, create a scatterplot of budget and sentiment, and add the partitions of the regression tree. Interpret at least 2 partions of the partitioned scatterplot.

EXERCISE 5 [R-CODE]
Based on the regression tree created in Exercise 4, use cross-validation to determine the optimal tree size and prune the tree. Plot and interpret the tree.

EXERCISE 6 [R-CODE]
Use R to create a classification tree to predict posrating. As predictors, take into account the variables aggregate_followers, comments, likes, dislikes, and sentiment. Use cross-validation to determine the optimal tree size and prune the tree. Plot and interpret the tree.

This lab project is based on a dataset about movie success in 2014 and 2015 by Ahmad et al. (2015) which is available on the online platform by Lichman et al (2013). Download the file movidata.csv from Blackboard and then practice the following topics in preparations for Lab Project 5.

PREPARATION
In preparation for Lab Project 5, load the moviedata.csv dataset. Use the dataset to practice the following topics:
- Logistic regressions: coefficient estimates, predicted probabilities, residuals, standard errors, confidence intervals, z-values, AIC, log likelihood
- Search for and watch online videos and blog entries about implementing a classifier (based on logistic regressions and classification trees; e.g., YouTube, R-Bloggers, Stackoverflow)
- Reporting of results in APA style
- Analysing and interpreting the results of logistic regressions
- Creating and interpreting a confusion matrix
- Sensitivity, specificity, and accuracy
- Area under the ROC
- Decision trees: regression trees and classification trees
- Pruning decision trees
- Analysing and interpreting the results of decision trees

Attachment:- Dataset_Description_and_Preparation.rar

Reference no: EM132687457

Questions Cloud

Advantages and four disadvantages of outsourcing : Outsourcing jobs is a technique used today in organizations as an Organizational Development approach
What is the process of management : What is the process of management? What are the managerial roles?
Major challenges facing the global managerial worker : Identify and briefly describe at least five of the major challenges facing the global managerial worker. Support your answer by using an additional source of in
Problem - Differential Analysis for Machine Replacement : Problem - Differential Analysis for Machine Replacement - What are some of the other factors that should be considered before a final decision is made
Create a classification tree to predict posrating : Create a classification tree to predict posrating. As predictors, take into account the variables aggregate_followers, comments, likes, dislikes, and sentiment.
Identify governmental and non-governmental regulatory : Identify governmental and non-governmental regulatory bodies and their functions for nonprofit organizations
Prepare a differential analysis dated october : Prepare a differential analysis, dated October 11, 2012, to determine whether the company should make (Alternative 1) or buy (Alternative 2) the carrying case
One model for structuring the corporate communications : Describe and analyze at least one model for structuring the corporate communications function used by today's multinational organizations.
Estimate the contribution margin for each segment : Segment analysis, Charles Schwab Corporation - Estimate the contribution margin for each segment, assuming depreciation represents the majority of fixed costs

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd