Reference no: EM133559546
Overview
For this assignment, students will work in groups of 2 or 3. Each group needs to choose a real dataset that the group members find interesting, in the sense that they believe it contains data which can provide useful information if explored. Students then need to implement, via the R programming language,different techniques that we have covered in this unit to try to find the best way to answer their questions about the dataset and extract the useful information.
There are numerous datasets available online, and a great repository with datasets is the UCI Machine Learning repository:
You are free, however, to choose any data set you prefer, the conditions being that
1. The dataset must be freely available online so that I can download it and perform the analysis myself.
2. Students must each choose unique projects - this generally means different datasets entirely.
If you decide to work in a group of 3, you need to work on 2 datasets.
If you have another preferred source of data then you may request to use that instead and I'll have a look. I can also propose other datasets, if students need additional choices. Having decided on a dataset you should then post up your plans on the discussion forum for other students to view and comment. This discussion is assessed.
Your results, after using on the dataset the techniques you have learned in this unit, should then be described and explained to the reader. The report does not require lengthy text sections and
much of the content may contain the results, the analysis of the results and/or graphs or plots as required.
In conjunction with the submission of the report, students will also present an overview of the findings, as explained below.
Deliverables:
1. Online Discussion forum: Post your proposed topic and chosen dataset as well as a short plan for the project. Explain if it falls into the supervised or unsupervised learning category and if it is a regression or classification problem. The above is required for approval of the topic. As discussed, students must select unique topics, therefore if any assignments overlap they will not be accepted. This should be done by the end of week 10. Also any queries about the assignment deliverables should be made in the discussion forum so that other students can also benefit from the responses.
2. Oral Presentation: You will be required to present a brief (10) minute executive summary of your project in class. This is a mandatory component of the assignment.
3. Data Mining technical report:The marks for the report section are split into three areas:
a. Data understanding and preparation
b. Algorithms/techniques chosen andimplemented in the R programming language for data analysis
c. Presentation,discussion and quality of the results - explanation of interesting patterns found
Attachment:- Foundations of Data Science.rar