Visualisation and model development assessment

Assignment Help Computer Engineering
Reference no: EM133567790 , Length: word count:1000

Big Data and Analytics

Assessment - Visualisation and Model Development

Learning Outcome 1: Apply data science principles to the cleaning, manipulation, and visualisation of data
Learning Outcome 2: Design analytical models based on a given problems; and
Learning Outcome 3: Effectively report and communicate findings to an appropriate audience.

Task Summary

Customer churn, also known as customer attrition, refers to the movement of customers from one service provider to another. It is well known that attracting new customers costs significantly more than retaining existing customers. Additionally, long-term customers are found to be less costly to serve and less sensitive to competitors' marketing activities. Thus, predicting customer churn is valuable to telecommunication industries, utility service providers, paid television channels, insurance companies and other business organisations providing subscription-based services. Customer-churn prediction allows for targeted retention planning.

In this Assessment, you will build a machine learning (ML) model to predict customer churn using the principles of ML and big data tools.

As part of this Assessment, you will write a 1,000-word report that will include the following:
a) A predictive model from a given dataset that follows data mining principles and techniques;
b) Explanations as to how to handle missing values in a dataset; and
c) An interpretation of the outcomes of the customer churn analysis.

Task Instructions

1. Dataset Construction

Kaggle telco churn dataset is a sample dataset from IBM, containing 21 attributes of approximately 7,043 telecommunication customers. In this Assessment, you are required to work with a modified version of this dataset (the dataset can be found at the URL provided below). Modify the dataset by removing the following attributes: MonthlyCharges, OnlineSecurity, StreamingTV, InternetService and Partner.
As the dataset is in .csv format, any spreadsheet application, such as Microsoft Excel or Open Office Calc, can be used to modify it. You will use your resulting dataset, which should comprise 7,043 observations and 16 attributes, to complete the subsequent tasks. The ‘Churn' attribute (i.e., the last attribute in the dataset) is the target of your churn analysis.

2. Model Development
From the dataset constructed in the previous step, present appropriate data visualisation and descriptive statistics, then develop a ‘decision-tree' model to predict customer churn. The model can be developed in Jupyter Notebook using Python and Spark's Machine Learning Library (Pyspark MLlib). You can use any other platform if you find it more efficient. The notebook should include the following sections:
a) Problem Statement
In this section, briefly state the context and the problem you will solve in the notebook.
b) Exploratory Data Analysis
In this section, perform both a visual and statistical exploratory analysis to gain insights about the dataset.
c) Data Cleaning and Feature Selection
In this section, perform data pre-processing and feature selection for the model, which you will build in the next section.
d) Model Building
In this section, use the pre-processed data and the selected features to build a ‘decision-tree' model to predict customer churn.
In the notebook, the code should be well documented, the graphs and charts should be neatly labelled, the narrative text should clearly state the objectives and a logical justification for each of the steps should be provided.

3. Handling Missing Values
The given dataset has very few missing values; however, in a real-world scenario, data- scientists often need to work with datasets with many missing values. If an attribute is important to build an effective model and have significant missing values, then the data- scientists need to come up with strategies to handle any missing values.
From the ‘decision-tree' model, built in the previous step, identify the most important attribute. If a significant number of values were missing in the most important attribute column, implement a method to replace the missing values and describe that method in your report.

4. Interpretation of Churn Analysis
Modelling churn is difficult because there is inherent uncertainty when measuring churn. Thus, it is important not only to understand any limitations associated with a churn analysis but also to be able to interpret the outcomes of a churn analysis.
In your report, interpret and describe the key findings that you were able to discover as part of your churn analysis. Describe the following facts with supporting details:
- The effectiveness of your churn analysis: What was the percentage of time at which your analysis was able to correctly identify the churn? Can this be considered a satisfactory outcome? Explain why or why not;
- Who is churning: Describe the attributes of the customers who are churning and explain what is driving the churn; and
- Improving the accuracy of your churn analysis: Describe the effects that your previous steps, model development and handling of missing values had on the outcome of your churn analysis and how the accuracy of your churn analysis could be improved.

Attachment:- Visualisation and Model Development.rar

Reference no: EM133567790

Questions Cloud

Discuss how findings and conclusions of the article impact : Discuss how the findings and conclusions of this article would impact the industry or organization you have identified for your future career.
Exposed to toxic chemicals through everyday : Discuss the use of chemicals in society today. Are we more exposed to toxic chemicals through everyday use or due to industrial pollution?
What are some specific reasons that minorities have lower : what are some specific reasons that minorities have lower life expectancies? How might we go about solving this social problems from the viewpoint of the three
How does your advice relate to the concepts of ethnocentrism : If some people from other countries came to your country, what advice would you give them? How does your advice relate to the concepts of ethnocentrism
Visualisation and model development assessment : Design analytical models based on a given problems - Effectively report and communicate findings to an appropriate audience
Explain what strains more often impact richer individuals : Based on your reading about Strain Theory, use your critical thinking skills, identify and explain what strains more often impact richer individuals and why?
Consider the idea of institutional discrimination : Consider the idea of "institutional discrimination," which is described in your text as a form of discrimination that is embedded in the way social institutions
Which country had the highest percentage of lifetime ipv : Which country had the highest percentage of Lifetime IPV point estimates? What was the percentage? What conclusions can you make on why the incidence is so high
Describe your familiarity with latin america to date : How would you describe your familiarity with Latin America to date? What informs your knowledge? What dominant narratives exist, and how do they permeate

Reviews

Write a Review

Computer Engineering Questions & Answers

  Describe the process you used to create the KML file

BUS5DWR - DATA WRANGLING AND R - La Trobe University - How do they differ and in which cases is the KMZ format more suitable over KML

  Describe procedures sharing information with outside parties

Include the eight basic elements of an incident response plan. Describe procedures for sharing information with outside parties.

  Discuss the must have items and the nice to have items

Discuss the "must have" items and the "nice to have" items you placed in your disaster recovery plan (remember companies have competing costs).

  What is the percentage of attacks on networks

What are mobile forensics and do you believe that they are different from computer forensics? What is the percentage of attacks on networks that come from.

  How the company should use the strategies

List and describe (3) strategies organizations can employ for enhancing the customer experience. Select from Amazon, Walmart.

  What is the recurrence for the running time

We're given an array of n numbers, A[1 · · · n] and want to add up the n numbers. What is the recurrence for the running time? Solve it.

  Discuss ways a business can balance

List and describe at least 2 competing drivers and discuss ways a business can balance these 2 competing drivers. Please explain why you made your choices.

  Generating the program from the given structure

struct info student[500]; Utilizing the provided structure above, generate the program statements.

  Should end users be allowed to install applications

In a corporate, networked setting, should end users be allowed to install applications on their company workstations, whether the applications are on a DVD.

  Why has functional programming never become dominant

Imperative programming was the dominant paradigm from the dawn of computing until about 1990. Why has functional or logic programming never become dominant?

  Questionassume that you are working with rick and carla

questionassume that you are working with rick and carla when new systems request comes in. swls vice president of

  Discuss these forms of protecting electronic transactions

Briefly discuss these 2 forms of protecting electronic transactions and provide examples where possible.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd