Evaluate the clustering result using accuracy

Assignment Help Other Subject
Reference no: EM132304495

Background

In this assignment, you will analyse an open dataset about a marketing campaign of a Portuguese bank to design strategies for improving future marketing campaigns. The object of this campaign is to pursuit customers to subscribe the term deposit. The marketing campaigns were based on phone calls. The dataset contains the call information with the following attributes in Table 2.1.

Task Description

We provide one IPython notebook Task2.ipynbtogether with acsvfilebank.csvat thedatasub-folder. You are required to analyse this dataset using IPython notebook with Spark packages including spark.sql and pyspark.ml.

Table 2.1: Attribute information of the dataset

Attribute Meaning
age age of the customer
job type of job
marital marital status
education education level
default has credit in default?
balance the balance of the customer
housing has housing loan?
loan has personal loan?
contact contact communication type
day last contact day of the week
month last contact month of year
duration last contact duration, in seconds
campaign number of contacts performed
pdaysnumber  of days that passed by after a previous campaign previous number of contacts performed before this campaignpoutcomeoutcome of the previous marketing campaign
deposit has the client subscribed a term deposit?

Python Notebook

To systematically investigate this dataset, your IPython notebook should follow the basic 6 procedures as:

(1) Import the csv file, "bank.csv", as a Spark dataframe and name it as df, then check and explore its individual attribute.

(2) Select important attributes from df as a new dataframedf2 for further investigate. You are required to select 13 important attributes from df: `age', `job', `marital', `education', `default', `balance', `housing', `loan', `campaign', `pdays', `previous', `poutcome' and 'deposit'.

(3) Remove all invalid rows in the dataframedf2 using spark.sql. Supposing that a row is invalid if at least one of its attributes contains `unknown'. For the attribute `poutcome', the valid values are `failure' and `success'.

(4) Convert all categorical attributes to numerical attributes in df2 using One hotencoding, then apply Min-Max normalisation on each attribute.

(5) Perform unsupervised learning on df2 including k-means and PCA. For k-means, you can use the whole df2 as both training and testing data and evaluate the clustering result using Accuracy. For PCA, you can generate a scatter plot using the first two components to investigate the data distribution.

(6) Perform supervised learning on df2 including Logistic Regression, Decision Tree and Naive Bayes. For the three classification methods, you can use 70% of df2 as the training data and the remaining 30% as the testing data and evaluate their prediction performance using Accuracy.

Case Study Report

Based on your IPython notebook results, you are required to write a case study report with 500 1000 words, which should include the following information:

(1) The data attribute distribution

(2) The methods/algorithms you used for data wrangling and processing

(3) The performance of both unsupervised and supervised learning on the data

(4) The important features which affect the objective (‘yes' in ‘deposit') [Hint: you can refer the coefficients generated from the Logistic Regression]

(5) Discuss the possible reasons for obtaining these analysis results and how to improve them

(6) Describe the group activities, such as the task distribution for group members and what you have learnt during this project.

Attachment:- DATA ANALYTICS.rar

Reference no: EM132304495

Questions Cloud

Discuss different types of firearms : Discuss different types of firearms. Discuss the difference between class and individual characteristics used in a firearms comparison.
Discuss how each data source is relevant to the problem : Discuss what data you have collected or researched to indicate there is a problem. Discuss how each data source is relevant to the problem.
Sales revenue on physical disc : What are some problems faced by GameStop store with their sales revenue on physical disc and how other digital platforms can affect them from competing against.
List two benefits of the it strategy to use a saas solution : Identify three things the CIO and his team should look for as they consider which SaaS vendor to select.
Evaluate the clustering result using accuracy : DATA ANALYTICS: BANK MARKETING - Evaluate the clustering result using Accuracy. For PCA, you can generate a scatter plot using the first two components
How have they partnered with other local government agencies : How have they partnered with other local government agencies (i.e., law enforcement, EMS, school district, etc.)? How have they partnered with private.
What is a hipaa violation : Reading patient's medical records without authorization comes under HIPAA violation - Unauthorized access of PHI.
The work done by human experts will change going forward : What are some business problems that AI + Machine Learning paradigm cannot solve? How do you think the work done by human experts will change going forward?
Discussing ways to achieve greater balance of power : Using the U.S. Constitution, library, Internet, or any other available materials, focus your discussion on the following: Discuss 3 ways to achieve greater.

Reviews

len2304495

5/13/2019 3:25:59 AM

hi please have a look at the attachment very carefully and go through the highlighted part even more carefully. The solution will be in 2 parts ipython notebook and report. Please read it carefully and feel free to ask any doubts.

Write a Review

Other Subject Questions & Answers

  What is the definition of multicultural education

What is the definition of multicultural education and what are some of its benefits and challenges in an early childhood setting?

  Discuss different concepts presented in the articles

Discuss at least 3 different concepts presented in the articles. As an IT professional, how would you apply the three (3) concepts you identified.

  Define the foundations for criminal behavior

Theorists believe that psychological and psychiatric issues are the foundations for criminal behavior

  What is your analysis and opinion of the affordable care act

In 2010, President Obama signed the Affordable Care Act. The law enacted comprehensive health insurance reforms. What is your analysis and opinion of the Affordable Care Act?

  Kinetic energy to equal tom kinetic energy

How fast would Jerry have to be running in order for the value of his kinetic energy to equal Tom's kinetic energy?

  Explain why you felt article was relevant

Share a current event article with class that relates to concepts covered in this reading. Write a brief summary, and explain why you felt article was relevant.

  Diagnostic and statistical manual of mental disorders

Evaluate the roles each of the following would play in addressing Anne Marie's challenges: Diagnostic and Statistical Manual of Mental Disorders (DSM)

  What is determine and indeterminate sentencing

What is determine and indetermine sentencing. Which sentencing model do you feel is most appropriate. Explain why and provide an example.

  Develop us strategy proposal using the ends ways and means

Develop a United States strategy proposal using the "Ends, Ways and Means" model to address the terrorism security challenge in the SOUTHEAST ASIA area of focus.

  Influence marital success-good communication-maturity

Discuss three factors that influence marital success, good communication, maturity, and financial shifts.

  What role has the islamic religion played

What role has the Islamic religion played in the history Southwest Asia and Northern Africa since the 7th century?

  Describe what fiduciary responsibilities are

Describe what fiduciary responsibilities are and how they are applied to a Board of Directors or Trustees. Example of responsibilities for healthcare governing.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd