Reference no: EM133156808
COMP 5070 Statistical Programming For Data Science - University of South Australia
- You must submit R-code fileand a word (or PDF) document with a nice-looking report.No archived files are allowed, that is, no zip, rar, tar, 7z, etc...
1. Code all requested components
2. Aim for optimised code in terms of computational overhead (5%). It is not always possible to avoid loops, however you should aim to avoid loops where possible.
3. Use a clear coding style (5%). Code clarity is an important part of your submission. Thus you should choose meaningful variable names and adopt the use of comments - you don't need to comment every single line, as this will affect readability - however you should aim to comment at least each section of code.
4. Have the code run successfully (5%).
5. Output the information in a presentable manner as decided by yourself and present the requested statistical analyses/discussions (35%).
6. Document code limitations including, but not limited to, the requested functionalities (5%).
• Assignments submitted late, without an extension being granted, will attract a penalty of 10 marks per each day or any part thereof beyond the due date and time.
Data Analysis, Visualisation and Interpretation
Data and Background Information
Dataset contains information related to COVID-19 for more than 200 countries.
Most variable names are self-explanatory and there are too many of them. You don't need all variables for the task. Full description of all variables is available in the codebook or online. It is very important to review this description, so you understand correctly how all these variables are measured.
If you have any doubts, feel free to ask on the forum or by email.
Research report
For this assignment, you will need to produce a report summarising a collection of requested statistical analyses and visualisations of the data. See the below for details. Youwill need to submit a proper written and nicely presented report and R-script file.
As a guideline, excluding tables/figures, 2-3 pages of writing will be sufficient for the report. I won't strictly count words so if you go over/under - that's fine, but this is a good ballpark to aim for.
The report should contain:
1. An introduction outlining analysis to follow/background information/available data.
2. What are total numbers of COVID-19 cases in the following countries: Australia, China, India, New Zealand, Sweden, Ukraine, United Kingdom, United States? Provide numbers and appropriate graph. Analyse the progress of total cases for these countries and produce the graph similar to the one you might see in the media in the past:
70 days as above and your graph should be identical to the example. When you are confident that your graph is correct, you change it to the full history.
Grey dashed lines "doubles in ... days" are not required.
3. Provide and discuss descriptive statistics for the variable "gdp_per_capita", then use it to split all countries in three groups: "rich" countries, "average" countries and "poor" countries. Compare numbers of deaths and numbers of cases per month for these three groups of countries.
Hint: the data set is daily based; however, GDP does not change every day. Also, some countries are small, while other countries are large. Prepare your data correctly. You must compare apples to apples.
4. Covid-19 was declared as a pandemic. However, there might be another pandemic no one talks about - cardiovascular diseases. Study the distribution of cardiovascular diseases (variable "cardiovasc_death_rate") overall and compare it to COVID-19 death rate. Repeat this comparison for three groups of "rich", "average" and "poor" countries.
5. COVID-19 mortality rate is a ratio of the number of deaths to the number of cases. Different countries might have different mortality rates due to COVID-19 case. Calculate and report overall statistics for mortality rates. Then select one variable from all variables available in the data that might help explaining the difference in mortality rates. Run appropriate analysis to demonstrate the relationship.
6. Conclusions summarising all your research findings.
Attachment:- Statistical Programming.rar