Reference no: EM133144344 , Length: word count:4000
Principles of Data Science
Assessment - The selection, application, and evaluation of datascience methods, tools, and techniques.
Learning outcome 1: Demonstrate a critical understanding of foundations and principles of data science
Learning outcome 2: Demonstrate deep knowledge of fundamental statistical methods, techniques, and applications in data science.
Learning outcome 3: Critically assess, select, and apply data collection and cleaning, visualization, statistical inference, predictive modeling, and decision making for statistical analysis in the context of applied data analysis problems.
Learning outcome 4: Critically evaluate the choice of data science techniques and tools for particular scenarios.
Learning outcome 5: Build a critical awareness of professional, legal, cultural, and ethical issues surrounding analysis, exploration, protection, and dissemination in the context of your role as a data scientist.
Task Overview
In this assignment, you will be required to select, apply and evaluate a choice of data science methods, tools and techniques on a sizeable dataset of your choosing. You will explore the dataset and describe and justify the methods that will be used in the investigation, in terms of the problem being investigated. After applying these methods, you will then discuss the findings that have been produced, and critically reflect upon the process and the outcomes. This documentation will take the form of a 4000-word report.
Task Scenario
You have been provided with access to three datasets; all are available in the "Assessment & Submission" folder on Blackboard, along with the accompanying documentation and specific requirements for each. You have been given the choice of any one of these scenarios as your project. Using at least two different techniques, your task is to produce models that possesses predictive capacity with regards to the response variable within the dataset, and to evaluate the performance of these models. Where possible, you will also provide insight into the feature importance with regards to the predictive capacity of your model.
All three datasets have been cleaned and are ready for use, however you may still wish to conduct some data preparation and/or transformation so that the data is in an appropriate condition and format for the analysis methods that you wish to use. You may choose to use any methods you wish to tackle the chosen problem; however, you must justify the use of your approach.
The key components of this task that you must complete are:
• Explore the data so that you understand the structure, characteristics and limitations of the dataset.
• Identify the forms of analysis that will be able to produce a successful outcome for the scenario. Ensure that the chosen method(s) are suitable for use on the dataset that you have chosen to use and justify the use of your chosen approach. (You may use methods that have
been taught during the module as well as others that have not been used within the taught materials, as long as the choice of these methods is appropriately justified).
• Process the data into a condition suitable for the model building to be performed, including the selection of features to be used within the model.
• Build a model that allows for the response variable in the dataset to be predicted.
• Evaluate the capabilities of the model that has been developed, using suitable metrics.
• Present and describe your findings and recommendations in a manner suitable for the target audience.
• Critically evaluate the process and discuss the outcome of the project.
Attachment:- Principles of Data Science.rar