Reference no: EM133311680
Data Management and Visualization
Task Description
Data sets vary from domain to another. In this coursework, you will select a dataset related to a real-world problem that best suits your area of interest. There are abundant of websites that provide publicly available datasets. A categorised list of datasets from GitHub can be found. The UCI Data Repository is another long-standing source of benchmark datasets for data analysis research. Kaggle has interesting real-world problems and datasets.
You can select a dataset from the above sources, or another one that is available online. The dataset should be publicly available. The chosen dataset should have a reasonable size. You have to complete the following stages in this assignment:
1. Import a real life data set.
2. Identify the insights that the data set is potentially can provide.
3. Data exploration and preparation: The nature of the dataset may dictate some data exploration and preparation that can help inform the decision.
4. Perform necessary data manipulation.
5. Perform basic exploratory data analysis.
6. Use appropriate visualisation for the results.
7. Critically evaluate and interpret the results and how they can support business decision making.
8. Reflect on professional, ethical and legal issues in relation to the problem and the data set.
The report will be assessed on:
» understanding of different tools in R
» review of relevant literature
» development methodology
» justification of design decisions
» consideration of professional, ethical and legal issues The report could broadly include the following sections:
• Abstract
• Introduction (introduce the data set and its significance of embedded insights)
• Literature review of related work
• Data exploration
• Experiments (data preparation, manipulation, analysis, visualization)
• Results
• Discussion, Conclusions and Future Work
• References