Reference no: EM133188961
MIS772 Predictive Analytics - Deakin Business School
Assessment - Data Analysis and Report
Learning Outcome 1: Understand and apply key statistical theories and data mining concepts
This assignment aims for students to learn how to ...
• Articulate problems and solutions in business terms
• Gain insights from data
• Prepare data for different models
• Develop classification models
Case Study Description
This case is about a Portuguese banking institution, which has a growing customer base. The bank manager would like to employ data analytics and machine learning to analyse its customer data for bank direct marketing campaign. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required.
The data set contains approximately 45,211 observations with 17 variables as described in the below:
bank client data:
1 - age (numeric)
2 - job: type of job (categorical)
3 - marital: marital status (categorical)
5 - default: has credit in default? (binary)
6 - balance: average yearly balance, in euros (numeric) 7 - housing: has housing loan? (binary)
8 - loan: has personal loan? (binary)
related with the last contact of the current campaign: 9 - contact: contact communication type (categorical) 10 - day: last contact day of the month (numeric)
11 - month: last contact month of year (categorical)
12 - duration: last contact duration, in seconds (numeric)
13 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
14 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted)
15 - previous: number of contacts performed before this campaign and for this client (numeric) 16 - poutcome: outcome of the previous marketing campaign (categorical)
17 - y - has the client subscribed a term deposit? (binary)
Financial AI are interested in generating some insights about the clients, especially answering the below questions:
A. What is the distribution of customer age by marital status?
B. What are the (top 5) most popular occupations among the bank customers? Among them, which occupation has the highest average yearly balance? Which occupations has the most people completed tertiary education?
C. How to reliably predict if the client will subscribe to the term deposit? Define appropriate measures and compare the performance of different classifiers to predict client's subscription.
Financial AI wants you to use RapidMiner to process and explore the provided data, and then develop and evaluate classifiers to predict the customer's subscriptions to the term deposit, and to minimise misclassifications.
Task and Deliverables:
• Executive Summary: Define your problem and solution in business terms, in doing so answer questions A, B and C, cross-reference with other report sections for support.
• Data Exploration, Pattern Discovery, and Preparation: Visualise the selected attribute characteristics. Use the visualisations to support answering questions A and B.
Prepare data for predictive modelling. Transform attributes or create new ones as needed. Use appropriate analysis and data visualisation to investigate relationships between attributes (predictors and label). Interpret results.
• Predictive Modelling: Create and explain two classification models, e.g., k-NN and Decision Tree, to address part of question C. Explain and justify your model's properties. Investigate and deal with any class imbalance.
• Model evaluation and improvement: Use hold-out and cross-validation of the model. Utilize honest testing. Compare the performance of different models and select the best. Qualify how much we can trust the answer to question C.
Attachment:- Data Analysis and Report.rar