Write python code to load your dataset into a pandas data

Assignment Help Python Programming
Reference no: EM132314589

Assignment

A medium-size Australian company (imaginary) has given you one year of data about the online purchases that their customers have made.  They want you to analyse the data using statistical and machine learning techniques and produce:

a prediction algorithm for predicting how much money each customer is likely to spend in a year;

a classification algorithm for predicting which customers will be 'big spenders';

some recommendations on what marketing strategy they should use to attract more 'big spender' customers.

Instructions


Follow all the instructions in this notebook to complete these tasks.  Note that some cells contain 'assert' statements - these will automatically mark your work so that you can check that you have done the preceeding steps correctly.  (If they give errors, then go back and correct your previous work until you fix those errors.  Once those 'assert' cells execute without errors, you know that you have achieved the marks for that step.)

When you have finished, this notebook is the only file that you will need to submit to Blackboard.

Note: If you want some space to try out some Python code of your own, feel free to add extra cells into this notebook.  Just make sure that before you submit your notebook, that those extra cells execute without error, or that you delete them before submitting.

Overview

You have five sections to complete in this Notebook

Part A: Load and Clean Data

Part B Data Exploration

Part C: Predicting Spending Levels

Part D: Predicting Big Spenders

Part E: Business Recommendations

Part A: Load and Clean Data


Save your CSV data file into the same folder as this notebook.

Write Python code to load your dataset into a Pandas DataFrame called 'sales.

Part B Data Exploration

In this section, you will explore the data statistically and visually, to get a feel for what kinds of data you have, and how much people are spending on your web site.

B.1 Data Inspection

Start by using the Pandas **describe()** function to analyse all the numeric columns of your 'sales' DataFrame.  Spend some time looking at this and making sure that you understand the average (mean) and range (min and max) of each column.

Data Inspection Questions

In the next cell, write your observations about the \"SpendValue\" and \"Purchases\" columns.  For each column, say what the average value is and discuss what that means in terms of your sales to an average person.  Also discuss the min and max values.

Based on the \"SpendValue\" column, explain how much your \"big spenders\" (the top 25% percent of your clients) are spending each year.  This will be a range of values, such as from 1000 to 2000 dollars.

Your discussion must all be in the next cell.

Add three level-2 headings in that cell to break your discussion into topics: \"Purchases column\", \"SpendValue column\", and \"Big Spenders\.

B.2 Differences between States


We want to know where most of our customers live and whether customers from certain areas spend more or less than average.  Write some Pandas code to calculate and display the total **number of customers** in each Australian state (NSW, QLD, VIC, etc.) and their average **SpendValue**. 

Hint: you could do this by *grouping* your 'sales' table, or by *looping* through all the states, or several other ways.

Question:

Discuss these graphs and explain your conclusions.

For example, are there *significant* differences in the average spend in different states?  Are our customer spread evenly across Australia, or concentrated in particular areas?

Write your answer in the next cell, and give reasons for your conclusions?

Part C: Predicting Spending Levels

Using the LinearRegression function from the Scikit-Learn library (sklearn), build a machine learning model for predicting the expected SpendValue for a customer.

Measure the performance of your model using 10-fold cross-validation with a test set size of 20% and print various measures of how accurate your predictions are.

Analysis of Results

Print out the linear regression coefficients for all the input features, so that you can see which ones are more significant and which ones are unimportant.

Hint 1: Since the scale of the input features is so different (0-1 for sex, 0-160000 for income, etc) multiply the linear regression coefficients by the average value of the corresponding column, to see how many dollars that column contributes to the total predicated-spend answer.

Hint 2: Could you graph the predicted and actual spendvalues of the test data, to visually see how good the linear regression results are?

Discussion:

Discuss your conclusions about this linear regression model (in the next cell).  Which input features are most significant?

Part D: Predicting Big Spenders


In this section we want to build some machine learning models predict if a new customer is likely to be a big spender or not.  This will be a binary outcome (yes or no), so we can use machine learning *classification* algorithms.

Remember that our definition of 'Big-Spender' is that it is a client whose annual spending level (**SpendValue**) is in the top 25% of our clients.  So the exact dollar cutoff for big spenders will be different for each student, as each of you are working for a different company and are using a different dataset.

Choose two classification algorithms.  Use each one to build and then evaluate a 'big-spender' prediction model.

Discussion:

Discuss your conclusions about your two classification models (in the next cell).

Which classification algorithm gives the more accurate results?

How accurate are the results from your best classifier?

Part E: Business Recommendations

The company you are doing this analysis for wants some recommendations from you about how to find new customers who are likely to be big spenders.  They are wondering if they should focus their advertising on a particular gender?  Or people in a given state, such as Victoria, or NSW?  Or aim at demographic groups who have high income level or medium income levels?  Or other strategies?  What recommendations will you give them?

Write about 100 words describing your conclusions from your analysis, and your recommendations for the best strategy for attracting new big-spender customers.

Attachment:- Assignment File.rar

Reference no: EM132314589

Questions Cloud

Weakness in the performance review process : Based on your own experience, what is the most serious weakness in the performance review process? How can it be changed?
What are some of the reasons thinking ethically : What are some of the reasons "thinking ethically" means different things to different people?
Are economic systems a form of moral philosophy : 1. Are Economic Systems a Form of Moral Philosophy? Explain.
Schyndelworks Website Project : COSC2737 IT Infrastructure and Security Assignment - Schyndelworks Website Project, RMIT University, Australia. Creation of Web-Server-A
Write python code to load your dataset into a pandas data : A medium-size Australian company imaginary has given you one year of data about the online purchases that their customers have made.
What are some of the social and environmental considerations : What are some of the social and environmental considerations you would need to manage at a CAR launch festival to ensure it is considered to be a sustainable
Evaluating events is a three stage process : Please help me answering this business/events question: Evaluating events is a three stage process:
Meet expected performance targets : Support team members to meet expected performance targets, including providing formal and informal learning opportunities as needed.
Network redesign and demonstration : Evaluate performance metrics and dimensions according to specifications - Apply concepts and theories of human factors as related to network design

Reviews

Write a Review

Python Programming Questions & Answers

  Write a python program to implement the diff command

Without using the system() function to call any bash commands, write a python program that will implement a simple version of the diff command.

  Write a program for checking a circle

Write a program for checking a circle program must either print "is a circle: YES" or "is a circle: NO", appropriately.

  Prepare a python program

Prepare a Python program which evaluates how many stuck numbers there are in a range of integers. The range will be input as two command-line arguments.

  Python atm program to enter account number

Write a simple Python ATM program. Ask user to enter their account number, and print their initail balance. (Just make one up). Ask them if they wish to make deposit or withdrawal.

  Python function to calculate two roots

Write a Python function main() to calculate two roots. You must input a,b and c from keyboard, and then print two roots. Suppose the discriminant D= b2-4ac is positive.

  Design program that asks user to enter amount in python

IN Python Design a program that asks the user to enter the amount that he or she has budget in a month. A loop should then prompt the user to enter his or her expenses for the month.

  Write python program which imports three dictionaries

Write a Python program called hours.py which imports three dictionaries, and uses the data in them to calculate how many hours each person has spent in the lab.

  Write python program to create factors of numbers

Write down a python program which takes two numbers and creates the factors of both numbers and displays the greatest common factor.

  Email spam filter

Analyze the emails and predict whether the mail is a spam or not a spam - Create a training file and copy the text of several mails and spams in to it And create a test set identical to the training set but with different examples.

  Improve the readability and structural design of the code

Improve the readability and structural design of the code by improving the function names, variables, and loops, as well as whitespace. Move functions close to related functions or blocks of code related to your organised code.

  Create a simple and responsive gui

Please use primarily PHP or Python to solve the exercise and create a simple and responsive GUI, using HTML, CSS and JavaScript.Do not use a database.

  The program is to print the time

The program is to print the time in seconds that the iterative version takes, the time in seconds that the recursive version takes, and the difference between the times.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd