Write python code to load your dataset into a pandas data

Assignment Help Python Programming
Reference no: EM132314589

Assignment

A medium-size Australian company (imaginary) has given you one year of data about the online purchases that their customers have made.  They want you to analyse the data using statistical and machine learning techniques and produce:

a prediction algorithm for predicting how much money each customer is likely to spend in a year;

a classification algorithm for predicting which customers will be 'big spenders';

some recommendations on what marketing strategy they should use to attract more 'big spender' customers.

Instructions


Follow all the instructions in this notebook to complete these tasks.  Note that some cells contain 'assert' statements - these will automatically mark your work so that you can check that you have done the preceeding steps correctly.  (If they give errors, then go back and correct your previous work until you fix those errors.  Once those 'assert' cells execute without errors, you know that you have achieved the marks for that step.)

When you have finished, this notebook is the only file that you will need to submit to Blackboard.

Note: If you want some space to try out some Python code of your own, feel free to add extra cells into this notebook.  Just make sure that before you submit your notebook, that those extra cells execute without error, or that you delete them before submitting.

Overview

You have five sections to complete in this Notebook

Part A: Load and Clean Data

Part B Data Exploration

Part C: Predicting Spending Levels

Part D: Predicting Big Spenders

Part E: Business Recommendations

Part A: Load and Clean Data


Save your CSV data file into the same folder as this notebook.

Write Python code to load your dataset into a Pandas DataFrame called 'sales.

Part B Data Exploration

In this section, you will explore the data statistically and visually, to get a feel for what kinds of data you have, and how much people are spending on your web site.

B.1 Data Inspection

Start by using the Pandas **describe()** function to analyse all the numeric columns of your 'sales' DataFrame.  Spend some time looking at this and making sure that you understand the average (mean) and range (min and max) of each column.

Data Inspection Questions

In the next cell, write your observations about the \"SpendValue\" and \"Purchases\" columns.  For each column, say what the average value is and discuss what that means in terms of your sales to an average person.  Also discuss the min and max values.

Based on the \"SpendValue\" column, explain how much your \"big spenders\" (the top 25% percent of your clients) are spending each year.  This will be a range of values, such as from 1000 to 2000 dollars.

Your discussion must all be in the next cell.

Add three level-2 headings in that cell to break your discussion into topics: \"Purchases column\", \"SpendValue column\", and \"Big Spenders\.

B.2 Differences between States


We want to know where most of our customers live and whether customers from certain areas spend more or less than average.  Write some Pandas code to calculate and display the total **number of customers** in each Australian state (NSW, QLD, VIC, etc.) and their average **SpendValue**. 

Hint: you could do this by *grouping* your 'sales' table, or by *looping* through all the states, or several other ways.

Question:

Discuss these graphs and explain your conclusions.

For example, are there *significant* differences in the average spend in different states?  Are our customer spread evenly across Australia, or concentrated in particular areas?

Write your answer in the next cell, and give reasons for your conclusions?

Part C: Predicting Spending Levels

Using the LinearRegression function from the Scikit-Learn library (sklearn), build a machine learning model for predicting the expected SpendValue for a customer.

Measure the performance of your model using 10-fold cross-validation with a test set size of 20% and print various measures of how accurate your predictions are.

Analysis of Results

Print out the linear regression coefficients for all the input features, so that you can see which ones are more significant and which ones are unimportant.

Hint 1: Since the scale of the input features is so different (0-1 for sex, 0-160000 for income, etc) multiply the linear regression coefficients by the average value of the corresponding column, to see how many dollars that column contributes to the total predicated-spend answer.

Hint 2: Could you graph the predicted and actual spendvalues of the test data, to visually see how good the linear regression results are?

Discussion:

Discuss your conclusions about this linear regression model (in the next cell).  Which input features are most significant?

Part D: Predicting Big Spenders


In this section we want to build some machine learning models predict if a new customer is likely to be a big spender or not.  This will be a binary outcome (yes or no), so we can use machine learning *classification* algorithms.

Remember that our definition of 'Big-Spender' is that it is a client whose annual spending level (**SpendValue**) is in the top 25% of our clients.  So the exact dollar cutoff for big spenders will be different for each student, as each of you are working for a different company and are using a different dataset.

Choose two classification algorithms.  Use each one to build and then evaluate a 'big-spender' prediction model.

Discussion:

Discuss your conclusions about your two classification models (in the next cell).

Which classification algorithm gives the more accurate results?

How accurate are the results from your best classifier?

Part E: Business Recommendations

The company you are doing this analysis for wants some recommendations from you about how to find new customers who are likely to be big spenders.  They are wondering if they should focus their advertising on a particular gender?  Or people in a given state, such as Victoria, or NSW?  Or aim at demographic groups who have high income level or medium income levels?  Or other strategies?  What recommendations will you give them?

Write about 100 words describing your conclusions from your analysis, and your recommendations for the best strategy for attracting new big-spender customers.

Attachment:- Assignment File.rar

Reference no: EM132314589

Questions Cloud

Weakness in the performance review process : Based on your own experience, what is the most serious weakness in the performance review process? How can it be changed?
What are some of the reasons thinking ethically : What are some of the reasons "thinking ethically" means different things to different people?
Are economic systems a form of moral philosophy : 1. Are Economic Systems a Form of Moral Philosophy? Explain.
Schyndelworks Website Project : COSC2737 IT Infrastructure and Security Assignment - Schyndelworks Website Project, RMIT University, Australia. Creation of Web-Server-A
Write python code to load your dataset into a pandas data : A medium-size Australian company imaginary has given you one year of data about the online purchases that their customers have made.
What are some of the social and environmental considerations : What are some of the social and environmental considerations you would need to manage at a CAR launch festival to ensure it is considered to be a sustainable
Evaluating events is a three stage process : Please help me answering this business/events question: Evaluating events is a three stage process:
Meet expected performance targets : Support team members to meet expected performance targets, including providing formal and informal learning opportunities as needed.
Network redesign and demonstration : Evaluate performance metrics and dimensions according to specifications - Apply concepts and theories of human factors as related to network design

Reviews

Write a Review

Python Programming Questions & Answers

  Write a function named digit_count that takes one parameter

Write a function named digit_count that takes one parameter that is a number (int or float) and returns a count of even digits, a count of odd digits.

  Implement the ransac algorithm for linear regression

You must find the observed data, threshold, also the outliers and remove them from (X,y) How exactly do i do this? please provide the code.

  Create a ride share simulator assignment

Create a ride share simulator assignment. In this particular function, I am trying to create a list of the drivers in a Driver Class that are currently listed as idle. Since they are idle, I can then match them up with an awaiting rider. The assignme..

  Write a program using the following python functions

You are required to write the following Python functions. Make sure you understand where each function fits into the system described above.

  Write python program which imports three dictionaries

Write a Python program called hours.py which imports three dictionaries, and uses the data in them to calculate how many hours each person has spent in the lab.

  What is a python development framework

What's a python development framework? give 3 examples python development framework used today.

  Write a python program that generates an informative website

ICT112 - University of Sunshine Coast - Generated Web Site for Aussie Road Fatalities - write a Python program that generates an informative web site

  Look up terms in a tech dictionary

Create a program that allows a user to look up terms in a tech dictionary - programming or scripting that is of interest to you, and complete one or more web-based tutorials on the topic.

  Segment that prompts the user for an arithmetic operator

The variables x and y refer to numbers. Write a code segment that prompts the user for an arithmetic operator and prints the value abtained by appying that operator to x and y.

  Describe the standard library modules

Find where the Python executables and standard library modules are installed on your system.

  Build a python application to recommend Training modules

The project is to build a python application to recommend Training modules based on skill proficiency, current project/job skills

  Write a python code for project named virtual election booth

Voter generates a pair of private and public keys - for the purpose of digital signature. Voter uses his private key to sign his request and the public key.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd