Reference no: EM133765467 , Length: word count:2500
Applied Modelling and Visualisation, MSc Management with Data Analytics
Assessment - MAV - SilkySky International Airlines Report
Assessment Brief:
For this assignment you are working as a Data Analytics Consultant for the SilkySky International Airlines and have been asked to prepare a Consultancy Report based on the airline's passenger 'satisfaction' Data Set. This report and your findings will be used in a 'visually appealing' presentation to the CEO, Senior Flight personnel and Cabin Crew in the Annual Staff Conference and it has been proposed some interactive elements will be placed securely on the company intranet.
You are provided with a set of data SILKYSKY_DATA_CW2.csv that summarises the levels of passenger 'satisfaction'. The file contains over 103,000 rows of information from the UK National Airlines database system for the current calendar year. Your objective is to use machine learning principles to model and visualise key data with a view to helping staff better understand what factors impacted levels of 'satisfaction' for passengers using the airline.
Your summative submission should be a written report in MSWord format (NOT a PDF file) and should be at most 2,500 words. It should describe how applied modelling and visualisation can be used to present summaries of passenger data. Your report will inform a corporate presentation so should be appropriately tailored to a rich and varied audience consisting of CEO, Senior Flight personnel and Cabin Crew. You are also required to carry out independent research into the deferent categories of 'satisfaction' and techniques used to analyse and forecast data in your report.
You must complete all the following tasks:
(IL01 - Formulate innovative data driven solutions to commercial problems) TASK 1: Develop a data-driven solution to the given scenario (1101).
The solution must use two analytical models to predict the scale and accuracy of the airline's data using the Python programming language and relevant Python libraries taking into consideration the following guidance notes.
Task 1- Data-Driven Solution Guidance notes:
You should provide a data-driven solution that:
• Follows an established design methodology (e.g. PPDAC or CRISP-DM or SDLC), including flowcharts and pseudocode
• Performs an Extract, Transform, and Load (ETL) process (including import, clean and prepare the data for analysis, whilst ensuring that the relevant test, validation and training sets are created).
• Performs Exploratory Data Analysis (EDA) with appropriate visualisations
• Trains and tests TWO analytical models
• Evaluates the models based on your choice of loss function
• Produces appropriate visualisations of your results
• Describes the solution development process
You should choose two from the following models:
• Logistic regression
• Decision Tree
• Bagging
• Random Forest
• AdaBoost
• XGBoost
• Artificial neural network
• Another appropriate state-of-the-art algorithm
(IL02 - Critically evaluate the use of algorithms and model when developing analytical solutions)
Task 2: Critically analyse the two models chosen for your solution in Task 1 (11.02)
Critically analyse the two models chosen for your solution in Task 1, and in particular, the strengths and limitations of each model using the guidance notes provided below with references to the relevant literature.
Task 2 Guidance notes:
Your critical analysis must also include:
• An explanation of your chosen loss function
• A short discussion of the accuracy metrics
• A summary table of the of the accuracy metrics of the two chosen models to support the selection of the best model
(IL03 - Critically appraise the concepts, tools and techniques for data visualisation)
Task 3: Communicate your findings supported by several outputs from Task 1 (IL03)
Communicate your findings supported by several outputs from Task 1, including graphical outputs such as correlation matrix, heat map, and confusion matrix using the guidance notes provided below.
Task 3 Guidance notes:
Your critical appraisal should be based on your findings in Task 1, and must also include:
• An analysis of how the Exploratory Data Analysis (EDA) output guided your selection of the analytical models
• An explanation of the justification for performing EDA and the use of appropriate descriptive statistics and visualisations to understand the results of that analysis
• A recommendation of the use of one model for sustaining or increasing the rate of 'satisfaction'
3. Research and Referencing
Your report should include a list of references used to develop the report and research to support the suggested approach. The list should use only the Harvard Referencing System as highlighted in the General Assessment Guidance section of this document. All the figures/tables used in the report must have captions and, wherever needed, properly referenced, and explained in your submission.
Suggested report format
Cover page (University cover sheet)
Table of Contents
List of Abbreviations (if appropriate)
Introduction (Scope and Background)
Key Factors that impact on passenger 'satisfaction'
Tasks (with Technical Details and Independent Research) Recommendations
Next steps
References
Appendix
The sections in bold contribute to the word count of 2,500 words
Locate the report file and embed your Pre-run Python notebook If you are unable to embed your python notebook in your MS Word document for any reason, you must provide a shared link to the file.