Reference no: EM132735879 , Length: 4000 Words
KL7010 Principles of Data Science - Northumbria University
Assignment - The selection, application and evaluation of data science methods, tools and techniques.
Task Overview
In this assignment, you will be required to select, apply and evaluate a choice of data science methods, tools and techniques on a sizeable dataset of your choosing. You will justify the choice of dataset in terms of the problem being investigated, explore the dataset and describe and justify the methods that will be used in the investigation. After applying these methods, you will then discuss the findings that have been produced, and critically reflect upon the process and the outcomes. This documentation will take the form of a 4000-word report.
Task Scenario
You have been provided with access to three datasets; all are available on Blackboard. The data covers the following scenarios:
• Predicting if credit card clients will default on monies owed.
• Predicting the cancellation of hotel bookings.
• Predicting the occurrence of heart disease among patients.
More information on each scenario can be found in the appendix to this document.
You have been given the choice of any one of the above scenarios as your project. Your task is to produce a model that possesses predictive capacity with regards to the response variable within the dataset. Where possible, you will also provide insight into the feature importance with regards to the predictive capacity of your model.
All three datasets have been cleaned and are ready for use, however you may still wish to conduct some data preparation and/or transformation so that the data is in an appropriate condition and format for the analysis methods that you wish to use. You may choose to use any methods you wish to tackle the chosen problem; however, you must justify the use of your approach.
The key components of this task that you must complete are:
• Explore the data so that you understand the structure, characteristics and limitations of the dataset.
• Identify the forms of analysis that will be able to produce a successful outcome for the scenario. Ensure that the chosen method(s) are suitable for use on the dataset that you have chosen to use, and justify the use of your chosen approach.
• Process the data into a condition suitable for the model building to be performed, including the selection of features to be used within the model.
• Build a model that allows for the response variable in the dataset to be predicted.
• Evaluate the capabilities of the model that has been developed, using at least metrics.
• Present and describe your findings in a manner suitable for the target audience.
• Critically evaluate the process and discuss the outcome of the project.
All of the above stages should be documented within the report, while all of the decisions that have been made throughout the process should be discussed and justified.
Learning Outcome 1: Demonstrate critical understanding of foundations and principles of data science
Learning Outcome 2: Demonstrate deep knowledge of fundamental statistical methods, techniques and applications in data science.
Learning Outcome 3: Critically assess, select, and apply data collection and cleaning, visualization, statistical inference, predictive modelling, and decision making for statistical analysis in the context of applied data analysis problems.
Learning Outcome 4: Critically evaluate the choice of data science techniques and tools for particular scenarios.
Learning Outcome 5: Build a critical awareness of professional, legal, cultural and ethical issues surrounding analysis, exploration, protection and dissemination in the context of your role as a data scientist.
• AO1: Demonstrate a robust theoretical knowledge of data science tools, techniques and approaches required to appropriately select and apply suitable methods to achieve a specific aim
• AO2: Demonstrate an ability to effectively explore a dataset, and process that dataset into a condition fit for a particular use.
• AO3: Demonstrate technical proficiency in the application of data science methods, tools, and techniques, including the practical appraisal of their success through appropriate means
• AO4: Communication of findings in an effective manner, using language, figures and visualizations appropriate for the intended audience
• AO5: Evaluate the success of the project through critical reflection, drawing particular emphasis to lessons learned and reasons behind the success - or otherwise - of the project.
• AO6: Achieve a high standard of presentation, including the structure and format of the report, the standard of spelling and grammar within, and the correct use of referencing throughout.