Reference no: EM132983038 , Length: word count:2000
MIS772 Predictive Analytics - Deakin University
Data Analysis and Report
Learning Outcome 1: Understand and apply key statistical theories and data mining concepts
Assignment Objectives
This assignment aims for students to learn how to ...
- Articulate problems and solutions in business terms
- Gain insights from data
- Prepare data for different models
- Develop classification models
- Assess and report model performance.
Case Study Description
SkytraxAI approached you to develop a RapidMiner process to assess travellers' experience with major airlines around the world, through which practical implications can be offered to airline managers to improve their services and traveller's experience in terms of their overall ratings.
SkytraxAI provided you with a sample of more than 41,000 customer reviews of 362 airlines, which were generated during the period 2002-2015. The data set includes the following information:
• airline_name: Name of the airline
• link: relative URL path to the airline webpage on Skytrax.
• title: review title
• author: user id of the reviewer on Skytrax.
• author_country: the country of origin where the traveller comes from.
• date: date when the review was posted to Skytrax
• cabin_flow: type of cabin flown by the traveller
• overall_rating: a number to reflex the satisfactory level of traveller (lowest 1 to highest 10).
• seat_comfort_rating: rating for seat comfortability (lowest 1 to highest 5).
• cabin_staff_rating: rating for cabin staff services (lowest 1 to highest 5).
• food_beverage_rating: rating for food and drink served onboard (lowest 1 to highest 5).
• inflight_entertainment_rating: rating for inflight entertainment system (lowest 1 to highest 5).
• value_money_rating: rating for value for money (lowest 1 to highest 5).
SkytraxAI are interested in generating some insights about the airlines, especially answering the below questions:
A. What are the (top 10) most popular airlines in this data set? Among them, which airline received the best visitors' experience in term of overall rating? Which airlines offer the best value for money?
B. What are the (top 5) most popular airlines used by travellers from the top two countries with the most travellers?
C. What are the (top 2) most popular airlines for business travel? Market competitors are defined as businesses that target similar groups of customers. Based on this definition, identify the market (author_country), where these two identified airlines are competitors for business travellers.
D. How to reliably predict if the travellers are satisfied with the airlines (overall ratings ≥ 4)? Define appropriate measures of traveller satisfaction and compare the performance of different classifiers to predict the traveller satisfaction with airlines.
SkytraxAI wants you to use RapidMiner to process and explore the provided data, and then develop and evaluate classifiers to predict the traveller satisfaction, and to minimise misclassifications. The data set is available on Cloud Deakin site, named MIS771 A1 data.zip. you will need to unzip the file before importing into RapidMiner.
Task and Deliverables:
• Executive Summary: Define your problem and solution in business terms, in doing so answer questions A, B, C and D, cross-reference with other report sections for support.
• Data Exploration, Pattern Discovery, and Preparation: Visualise the selected attribute characteristics. Use the visualisations to support answering questions A, B and C.
Deal with duplicates or missing values to prepare data for predictive modelling. Transform attributes or create new ones as needed. Use appropriate analysis and data visualisation to investigate relationships between attributes (predictors and label). Interpret results.
• Predictive Modelling: Create and explain two classification models, e.g., k-NN and Decision Tree, to address part of question D. Explain and justify your model's properties. Investigate and deal with any class imbalance.
• Model evaluation and improvement: Use hold-out and cross-validation of the model. Utilize honest testing. Compare the performance of different models and select the best. Qualify how much we can trust the answer to question D.
Attachment:- Predictive Analytics.rar