Reference no: EM133516712
Data Analytics
Assessment Report - Case Studies
Overview
A data analytics project starts with collecting the data and ends with communicating the results from the data. In between, there are multiple steps that are required to be followed- data preprocessing is one of the most important steps among them. The data preprocessing step itself has multiple steps depending on the nature, type, value etc. of the data.
On the other hand, data visualisation uses visual representations to explore, make sense of, and communicate data that often includes charts, graphs, illustrations etc. Today, there is a move towards visualisation that can be observed among many big companies.
Assessment Details
Case Study 1: Students are required to select a data set for regression tasks and define a question based on business requirement. This should include: (i) selection of dataset; (ii) exploring, summarizing and preparing the data; (iii) defining the problem and requirements; (iv) defining an experiment setup; (v) implementing your approach; and (vi) evaluating and analysing approach.
- Problem: Describe the problem and highlight the business need.
- Approach: Describe your approach It should focus on e.g., learning techniques, features, model tuning, parameter selection and analysis e.g., how the analysis will answer your questions
- Results: Summarize results and critically analyse results e.g. limitations of data, setup or approach, characteristic errors, possible improvements.
- Conclusion: Conclude with what you have learned from this study which would improve yourself as a data analyst. Would you recommend this as a solution to your problem? Provide reasons.
Case Study 2: Suppose that you have built a classifier that can identify whether an email is spam or not spam. After applying the classifier to the training data, you get the following confusion matrix.
- Calculate the accuracy, true positive rate, true negative rate, precision, and recall.
- Based on the accuracy value, do you think the classifier is doing a good job identifying spam - emails? Justify your answer.
- What is the class imbalance problem? How it is affecting the accuracy for the given scenario.