Reference no: EM132997711
Question: The "Movie Dataset (original)" tab is a partial dataset from Kaggle.com, and comprises 3515 movies scraped from the Internet Movie Data Base (IMDB). Use this dataset to answer the following questions.
Remember to create a partition (use a training set of 60% and a validation set of 40%).
The dataset may require scrubbing and/or the creation of categorical variables.
Provide all your answers and working models in a single Excel workbook; use different worksheets as needed.
Answer the following questions (which are also listed in the Assignment tab of the spreadsheet):
1. Describe what you did to prepare the data for analysis, and describe any assumptions you are making in answering the questions that follow.
2. Create a scatterplot comparing budget (y-axis) against title_year (x-axis). What pattern do you observe? What could explain this trend?
3. Define a movie to be a "success" if its gross revenue is at least double its budget. Create a logistic regression model to predict whether a movie will be a success, based on the number of critics, duration, actor_1 Facebook likes, director Facebook likes, budget, and year. What is the error rate on the validation set?
5. Of the given six predictors, what are the strongest three predictors of whether a movie is a success? How did you determine this?
6. Using a boosting neural network, can you reduce the error rate? If so, to what? What might be a drawback to using this type of method, compared to logistic regression?
The steps are as follows:
1. Use Excel and Frontline Solver to build a model (or models) for the problem.
2. Provide all your answers and working models in a single Excel workbook; use different worksheets (tabs) as needed. Give clear, simple names to your worksheets.
3. Ensure that your answers are in a format suitable for consumption by decision-makers (that is, it should not take a math professor to understand your answers.)
4. Write in complete sentences; do not just provide numbers.
Attachment:- Project Supervised Learning Assignment.rar