Reference no: EM133119671
Excel Data Analysis Capstone Project
Overview
This project requires that you use the tools learned throughout this portion of the course to create a model for a real world situation - creating a model to predict the success of NBA teams.
Before the world ends in late 2020, an even more tragic event occurred - the NBA season was interrupted!This season was shortened from its regular 82 games, however we really need a full set of statistics. The only solution the world has is to use your advanced analysis skills to predict what those win totals should be; these win predictions will be written into the history books, so we need to be accurate!
Part 1: Collect and Prepare Data
See the end of this document for more detail on the data collection.
For this model you will need to prepare two sets of data. The easiest way is to have each on one sheet in the same workbook:
• Source Data - 5 (or more) seasons of data (2018-2019) and back. This will have all statistics from the table (including wins and playoff status), but with the win proxy stats removed.
o Note: This model must not contain any win-like statistics such as losses, winning percentage, Pythagorean wins/losses, margin of victory, SRS, etc.When you've created the sheet you must remove the Win-proxy stats, some of those are L, PW, PL, MOV, and SRS. There may be others if you've added optional data to your model. Ensure you double check this as inclusion of any win/loss statistics will ruin the accuracy of your model.
• Subject Data - The most recent season (2019-2020). This will have the same set of statistics as the source data, with the exception that the wins and playoff columns should be blank - this is what we are predicting. The classification will use the data to predict the playoff status and the prediction will predict the number of wins.
o Your goals are to use the past set of statistics to build 2 models that will predict the number of wins and playoff status for the 2019 - 2020 season.
• For each of the classification and prediction model, please exclude the other target from the features. I.e., do not include wins to predict playoffs, do not include playoffs to predict wins.
Creating a Predictive Model
Once your data is prepared you can begin creating your predictive model. Any and all of the tools we looked at in the course are available to you. You may find that as you proceed in building your model that data needs to be added or removed from your initial worksheet. You may also choose to use other techniques such as normalization and partitioning to create a more accurate model.
As you are going through this process you must take note of what method you are using, what changes you make to the model data, and why you are making those decisions. You will need to present both your model, and the reasoning of why you built it as you did and why it is superior to the alternatives that proved to be less accurate. The process of developing your model is the most important part of this process, so ensure you are making logical improvements and documenting the reasoning and impact.
Note: Use the source data to build a predictive model targeted at predicting number of wins, then use that model to predict the number of wins on the subject data.
Creating a Classification Model
In this step you must create a classification model that uses your source data to predict if teams will be in the playoffs or not. Follow the same process as the prediction model, using classification tools to split the teams into the two groups.
Note: Use the source data to build a classification model targeted on the playoff status, then use that model to predict the playoff status on the subject data.
Overall Goal
If you're at all unclear, the main goal of all of this is to:
• Use past experiences - all the seasons that have been completed, along with all of the stats collected during those seasons to create two predictive models, one that guesses the number of wins, and another that guesses if a team makes the playoffs or not.
• Use those models to make predictions for the most recent season, where we (pretend we) don't know the real number of wins or playoff status, but we do know all the stats.