Create an analytical dataset

Assignment Help Other Subject
Reference no: EM132860531

Create an Analytical Dataset

A pet store chain is selecting the location for its next store. You will use data preparation techniques to build a robust analytic dataset, then build a predictive model to select the best location.

Project Overview

This project is the first part of a two-part series. In the first part, you will blend and format data and deal with outliers.

For the second part, you will use your cleaned up dataset to create another linear regression model. The difference this time is that you will have to choose which variable(s) are the most important for the model using new techniques learned in the Selecting Predictor Variables section.

Scenario

Pawdacity is a leading pet store chain in Wyoming with 13 stores throughout the state. This year, Pawdacity would like to expand and open a 14th store. Your manager has asked you to perform an analysis to recommend the city for Pawdacity's newest store, based on predicted yearly sales.

How Do I Complete this Project?
This project uses skills learned throughout the "Data Preparation" lessons. To complete this project:
• Go through the course.
• Apply the skills learned in the course to solve the business problem given in the project details.
• Use our guidelines and rubric to help build your project.
• When you're ready, submit it to us for review using the submission template found in the supporting materials section.

Skills Required
In order to complete this project, you must be able to:
• Understand different data types. Review Lesson 1 Understanding Data
• Deal with a variety of data issues. Review Lesson 2 Data Issues
• Format data appropriately. Review Lesson 3 Data Formatting
• Blend data together using joins and unions. Review Lesson 4 Data Blending

The Business Problem
Pawdacity is a leading pet store chain in Wyoming with 13 stores throughout the state. This year, Pawdacity would like to expand and open a 14th store. Your manager has asked you to perform an analysis to recommend the city for Pawdacity's newest store, based on predicted yearly sales.
Your first step in predicting yearly sales is to first format and blend together data from different datasets and deal with outliers.
Your manager has given you the following information to work with:
1. The monthly sales data for all of the Pawdacity stores for the year 2010.
2. NAICS data on the most current sales of all competitor stores where total sales is equal to 12 months of sales.
3. A partially parsed data file that can be used for population numbers.
4. Demographic data (Households with individuals under 18, Land Area, Population Density, and Total Families) for each city and county in the state of Wyoming. For people who are unfamiliar with the US city system, a state contains counties and counties contains one or more cities.

Map of Wyoming Counties

Steps to Success
Step 1: Business and Data Understanding
Your project should include a description of the key business decisions that need to be made.
Step 2: Building the Training Set
To properly build the model, and select predictor variables, create a dataset with the following columns:
City
2010 Census Population
Total Pawdacity Sales
Households with Under 18
Land Area
Population Density
Total Families
This dataset will be your training set to help you build a regression model in order to predict sales in the Practice Project in the next lesson. Every row should have sales data because we're trying to predict sales.

Notes
You should be consolidating the data at the city level and not at the store level. We only have data at the city wide level so any analysis at the store level will not be sufficient to complete this analysis.
We simply need to focus on cleaning up and blending the data together in this step.
If you've done everything correctly, the sum for each of the above columns should be:
• Census Population: 213,862
• Total Pawdacity Sales: 3,773,304
• Households with Under 18: 34,064
• Land Area: 33,071
• Population Density: 63
• Total Families: 62,653
with 11 rows of data
For Alteryx users:
• Use the Autofield Tool to help quickly convert your data fields into the appropriate datafields for analysis.
• Research these three specific formulas to help you get rid of unwanted characters in the Formula tool: ReplaceFirst, Left, FindString

Step 3: Dealing with Outliers
Once you have created the dataset, look for outliers and figure out how deal with your outliers. Use the IQR method to determine if there are outlier cities for each of the variables and then justify which city that has at least one outlier value should be removed.

IQR Steps
To calculate the upper fence and the lower fence, here are the exact steps:
1 . Calculate 1st quartile Q1 and 3rd quartile Q3 of the dataset. You can use the Excel function QUARTILE.INC or QUARTILE.EXC
2 . Calculate the Interquartile Range: IQR = Q3 - Q1
3 . Add 1.5 IQR to Q3 to get the upper fence: Upper Fence = Q3 + 1.5 IQR
4 . Subtract 1.5 IQR to Q1 to get the lower fence: Lower Fence = Q1 - 1.5 IQR
5 . Values above the Upper Fence and values below the Lower Fence are outliers

Attachment:- Analytical Dataset.rar

Reference no: EM132860531

Questions Cloud

Post an executive summary of your comprehensive project : Post an executive summary of your Comprehensive Project in the body of a post in this discussion
Demonstrates the significance of the work undertaken : Demonstrates the significance of the work undertaken, the objectivity of research and analysis underpinning the report, and the credibility of the resulting
Perspectives are underrepresented in your career field : What demographic groups, backgrounds, or perspectives are underrepresented in your career field? Why?
Bitcoin economics and data visualization : "Bitcoin Economics " - What were some of the more interesting assignments to you? What reading or readings did you find the most interesting and why?
Create an analytical dataset : Create an Analytical Dataset - you will use your cleaned up dataset to create another linear regression model - new techniques learned in the Selecting Predict
Determine which approach is best to assess it risk : Compare/contrast qualitative and quantitative assessments. Determine which approach is best to assess IT risk.
Analyze business problem in the mail-order catalog business : Analyze a business problem in the mail-order catalog business. You're tasked with predicting how much money your company can expect to earn from sending out
Engine and validate the views using data annotations : Apply ASP.NET to design static and dynamic web pages using MVC architecture to meet design principles and user requirements
Discussion of Competitive Environment : Compare the results, analyses and evaluations with those of (i) the competitor's company and (ii) the industry - Any Exceptional Recent Events that Could Impact

Reviews

len2860531

4/18/2021 8:03:39 PM

need to follow the instructions & guidelines attached + respect the words number mentioned for each Section, please focus on the highlighted parts as well using submission-template for any thing

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd