BDA601 Big Data and Analytics Assignment

Assignment Help Python Programming
Reference no: EM132965159

BDA601 Big Data and Analytics - Laureate International Universities

Assessment - Model Evaluation

Learning Outcome 1: Apply data science principles to the cleaning, manipulation and visualisation of data;
Learning Outcome 2: Design analytical models based on a given problem; and
Learning Outcome 3: Effectively report and communicate findings to an appropriate audience.

Task Summary
Any enterprise-level, big-data, analytics project aimed at solving a real-world problem will generally comprise three phases:
1. Data preparation;
2. Data analysis and visualisation; and
3. Making decisions based on the analysis or insights.
In this Assessment, you will help the global community in its fight against COVID-19 by discovering meaningful insights in a dataset compiled by the Johns Hopkins University Center for Systems Science and Engineering.
Given the significance of the issue, you will slice and dice the data using different methods and drill down to gain insights that will help the individuals concerned make the right decisions.

Task Instructions

1. Dataset Preparation

The Johns Hopkins University COVID-19 dataset is a time-series dataset that officially began recording the global number of confirmed infections, deaths and recovered patients on 22 January 2020. The fields available in the dataset include the Province/State, Country/Region, the Latitude and Longitude of a country and the dates. The data period runs from 22 January 2020 to present.

In this Assessment, you are required to work with the latest version of this dataset (the version you use will depend on the day you download it). The dataset can be found at the URL provided below.

For this Assessment, you are only required to download the dataset related to confirmed infection numbers (i.e., only download the file named: time_series_covid19_confirmed_global.csv).

All of the analyses for this Assessment should be conducted on the confirmed infection numbers. You should use the dataset as it is without making any modifications to the downloaded file.

Humdata.org. (2020). Novel Coronavirus (Covid-19) cases data.

2. Data Analysis and Visualisation
Using the dataset downloaded in the previous step, undertake a data analysis and visualisation of the top three infected countries.
The top three infected countries should be selected based on the total count of infected people from 22 January 2020 to the latest date in your file.
The analysis and the visualisation can be completed using the Python libraries of your choice
i.e. Pyspark MLlib. You can use any other platform if you find it more efficient. The analysis and the visualisation should address the following sections collectively:

a) Predictive Modelling
In this section, fit a linear regression model to the time-series data for each of the three countries with an assumption that the infection rate has been increasing since the official record started. In this model, your dependent variable will be the count of infection for the independent variable (i.e., the week number).
Please note, you should convert the time-series data and represent the dates in the form of a week number. For example, 22 January 2020 to 28 January 2020 will be Week 1, 29 January 2020 to 4 February 2020 will be Week 2, etc.
Once all three linear regression models are ready, analyse the models thoroughly and identify the model with the highest variance. Select that country and its linear regression model and move to the next step.

b) Clustering
In this section, perform a K-Means clustering on the dataset used in the previous step for the country that had the highest amount of variance.

In the previous step, one of the assumptions was that the infection rate has been increasing since the official record started. Clustering should help you to validate that assumption and most importantly, should help you discover a trend of infection count over a period.
Determine the best value of K for K-Means clustering through iteration. Once the clusters stabilise, analyse the clusters thoroughly and observe the trend over time.
For example, consider whether you had cluster/s at the top of the graph in the first weeks of January, whether the cluster/s came back down in the graphs in the following weeks and whether the cluster/s went up again. You will use these observations in the next step.

c) Graph Analytics
In this section, perform graph analytics and show the relationship between the country in question in the previous step and its neighbouring countries based on the weekly count of infection. Assume that the neighbouring countries do not share any borders with each other.
To determine the neighbouring countries, you can either use the latitude and longitude information from the dataset or your own knowledge of geography and present a graphical view.
As part of this analysis, assume that the neighbouring countries may also display similar cluster trends over a period (as seen in the previous step). In your video presentation, you will make recommendations to these neighbouring countries in relation to possible trends.

d) Visualisation
In this section, you are required to visualise your analytical findings (that you derived using the above steps).
In big data and analytics projects, visualisation is an integral part of any analysis and often brings the analysis to life. Thus, ensure that you produce a high-quality visualisation, which you can use to tell stories and drill down from the raw data to the decision-making process.

3. Video Presentation

After completing the whole data analysis and visualisation process, the outcomes need to be communicated to the neighbouring countries as identified in the previous step. Thus, you should prepare a video presentation summarising the insights discovered in the previous step. You should use 8-10 slides in your presentation and your presentation should be no longer than 10 minutes.

This video presentation is related to the big data and analytics project phase ‘making decisions based on the analysis and insights' (as described above). Thus, the contents of this video should be extremely helpful to the neighbouring countries as they make decisions about their COVID-19 policies.

Consequently, as you communicate about possible trends of infection, ensure that you support your findings with any insights that you discovered through predictive modelling, clustering, graph analytics and visualisation. Tell a story to your listeners by presenting drilled- down views of your discoveries and by relating all the outcomes from the analysis that you completed in the previous steps: predictive modelling, clustering, graph analytics and visualisation.

Attachment:- Big Data and Analytics.rar

Reference no: EM132965159

Questions Cloud

Negatives associated with offering stock and offering bonds : Imagine that you owned a pharmaceutical business that needed to raise money to expand its operations and develop a Covid19 vaccine. Would you offer stock or wou
Opinion of the four stocks : a. What do you think of the idea of Sara keeping "substantial sums" of money in savings accounts? Would common stocks make better investments for her than savin
Prepare the shareholders equity section of the balance sheet : Prepare the shareholders' equity section of the balance sheet. Orleans has December 31 year-end. Orleans Corporation is authorized to issue an unlimited number.
Describe the type of type of disaster : Every company experience interruption in the business process due to man-made (intentional or unintentional) or natural disaster. The security team must work to
BDA601 Big Data and Analytics Assignment : BDA601 Big Data and Analytics Assignment Help and Solution, Laureate International Universities - Assessment Writing Service
Explain the meaning of cogent reasoning : Paragraph two should explain the meaning of cogent reasoning. In this paragraph, be sure to reference the three criteria for cogent reasoning.
What is the amount and character of taxable income : Preston purchases 1,000 shs of BIG Corp on April 1, 20x1 for $2,500. What is the amount and character of taxable income that Preston must recognize in 20x1?
How are analytics and business performance linked : How are analytics and business performance linked? Give examples
Why is it important to critically analyze sources : Why is it important to critically analyze sources

Reviews

Write a Review

Python Programming Questions & Answers

  Write a python program to implement the diff command

Without using the system() function to call any bash commands, write a python program that will implement a simple version of the diff command.

  Write a program for checking a circle

Write a program for checking a circle program must either print "is a circle: YES" or "is a circle: NO", appropriately.

  Prepare a python program

Prepare a Python program which evaluates how many stuck numbers there are in a range of integers. The range will be input as two command-line arguments.

  Python atm program to enter account number

Write a simple Python ATM program. Ask user to enter their account number, and print their initail balance. (Just make one up). Ask them if they wish to make deposit or withdrawal.

  Python function to calculate two roots

Write a Python function main() to calculate two roots. You must input a,b and c from keyboard, and then print two roots. Suppose the discriminant D= b2-4ac is positive.

  Design program that asks user to enter amount in python

IN Python Design a program that asks the user to enter the amount that he or she has budget in a month. A loop should then prompt the user to enter his or her expenses for the month.

  Write python program which imports three dictionaries

Write a Python program called hours.py which imports three dictionaries, and uses the data in them to calculate how many hours each person has spent in the lab.

  Write python program to create factors of numbers

Write down a python program which takes two numbers and creates the factors of both numbers and displays the greatest common factor.

  Email spam filter

Analyze the emails and predict whether the mail is a spam or not a spam - Create a training file and copy the text of several mails and spams in to it And create a test set identical to the training set but with different examples.

  Improve the readability and structural design of the code

Improve the readability and structural design of the code by improving the function names, variables, and loops, as well as whitespace. Move functions close to related functions or blocks of code related to your organised code.

  Create a simple and responsive gui

Please use primarily PHP or Python to solve the exercise and create a simple and responsive GUI, using HTML, CSS and JavaScript.Do not use a database.

  The program is to print the time

The program is to print the time in seconds that the iterative version takes, the time in seconds that the recursive version takes, and the difference between the times.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd