Advanced and integrated understanding of data modelling

Assignment Help Other Subject
Reference no: EM133547483 , Length: 15 pages

Big Data Management

Learning Outcome 1: Demonstrate advanced and integrated understanding of data modelling, storage, and retrieval methods and apply knowledge and skills to retrieve information from data storage;

Learning Outcome 2: Apply knowledge and skills to design and complete a project to coordinate and manage large data sets;

Learning Outcome 3: Analyse critically and interpret the knowledge from large data sets;

Learning Outcome 4: Interpret and transmit information and knowledge in the application discipline to specialist and non-specialist audiences;

Learning Outcome 5: Analyse critically and reflect on the issues of privacy and ethics of Big Data.

Suppose you are working for the Australian Government as a "Data Scientist" to tackle COVID-19 or any other future pandemic. Google has released a dataset on people's mobility during the pandemic. As a "Data Scientist," you have found some critical information from that dataset, which helped Australia understand COVID-19. Now, you are famous :).

So, Australia's Prime Anthony Albanese has hired you in his special Foreign Affairs team. He wants you to compare Australia's pandemic situation with any other country. Luckily, you have the dataset from Google and another new dataset regarding the COVID-19 cases in the government-secured server. Suppose the size of each dataset is 100 petabytes.

Therefore, you have chosen to use Spark to complete the analysis.

In this assignment, you will add some old information from Assignments 1 and 2.

Tasks of the Assignment:

• Explore two datasets and identify a research question.
• Now create spark distributed data frames from these datasets.
• Explore, Filter, and Analyse datasets using spark.
• Based on the analysis, answer the research question.
• You need to use spark for all kinds of analysis. You can only bring the data to pandas for visualisation.

1. Introduction
• Provide a brief discussion of the mobility dataset details.
• Provide a brief discussion of the covid case (cc) dataset details.
• From where did you download the mobility dataset?

2. Data Exploration:
• Discuss the size of the mobility dataset.
• Discuss the size and format of the cc dataset.
• Discuss the format of the mobility dataset.
• Discuss the features (columns) of the mobility dataset.
• Discuss the features (columns) of the cc dataset.

3. Literature Review:
• Find at least two research works from "Google Scholar (Any preprint or published work)" where the researchers have used this mobility dataset. Please provide a brief discussion of their research. How did the researchers use this dataset to answer their research question?
• Find at least two research works from "google scholar (Any preprint or published work)" where the researchers have used this cc dataset. Please provide a brief discussion of their research. How did the researchers use this dataset to answer their research question?

4. Research Question/Selection of the Problem:

• Identify a research question that you can answer after analysing both datasets. The research question must focus on countries, such as Australia and the UK.
• Justify your research question. Why is your research question important for comparing the COVID-19 situation between Australia and other countries?

5. Method (3):
• You are using Spark as you are dealing with big data. By the way, what is Spark?
• Why did you choose spark over Hadoop MapReduce?

6. Connection Between Datasets:
• How can you connect these two datasets to answer your research question?
• List the steps you have taken to find out the useful subset of the datasets.

7. Data Analysis:
• Provide a detailed analysis with appropriate visualisations to answer the research question.

{Relevant Discussions according to the Visualisations})

8. Findings:
• Provide the discussion to answer your research question based on the findings from the analysis.

9. Ethics and Privacy:
• Research Australian Law on collecting public data and show the validity of this mobility dataset according to Australian Law.
• Research Australian Law on collecting public data and show the validity of this cc dataset according to Australian Law.

10. Hosting on a server

• Please create a Spark cluster in AZURE and run your analysis code in that cluster. Now, record a video with any screen capturing software. The recording should show that you are using AZURE and you are running your whole code in the AZURE server using Spark. Upload this video to Google Drive and share the link at the end of the report or in a separate file named.

12. Presentation and Viva:

• Students need to present their work and findings. Questions will be asked at the end of the presentation
10. Writing Style and Report Format:

• The report is clearly written, and sections are connected.
• The report follows the given structure.
• Proper and correct in-text citation is presented in the report.
• The report cannot exceed fifteen pages (Page count includes everything from the table of contents to references and appendix). Any front of size 12pt is accepted.

Attachment:- Big Data Management.rar

Reference no: EM133547483

Questions Cloud

Multicultural competence and practice of counseling : One article related to ethical decision making and at least two articles related to multicultural competence and the practice of counseling.
Minimize confounding caused from personality variables : What steps can be taken to minimize confounding caused from personality variables?
Describe patient symptoms : Describe the patient's symptoms and the available demographic and historical data
Five epochs in history of psychology : Messias (2014) outlined five epochs in history of psychology. Which epoch do you believe to be the most significantly different from the one immediately prior?
Advanced and integrated understanding of data modelling : CSC6002 Big Data Management, University of Southern Queensland - Demonstrate advanced and integrated understanding of data modelling, storage
Developmental theories of piaget and erickson adolescents : Compare and contrast developmental theories of Piaget and Erickson regarding adolescents.
Academic engagement through active participation : Academic engagement through active participation in instructional activities related to the course objectives is paramount to your success in this course
Explain how different are the cognitive theories : Explain how different are the cognitive theories (Miechenbaum, Ellis, and Beck) from those first postulated by Epictetus?
Structured environment and meeting deadlines : As her career counselor, how would you help a client to improve their skills of having a structured environment and meeting deadlines

Reviews

len3547483

10/16/2023 12:06:53 AM

Please check the requirement page/word limit mention there. This has continuation of my first and second assignments This is first one So i need the continuation for these one Not more than 15 pages

Write a Review

Other Subject Questions & Answers

  Describe the implication of the unsafe behavior of ppe

Describe the implication of the unsafe behavior of PPE ( removing PPE), indicate ressources that were used to suport the discussion

  Illustrate how to create your favorite dish

Illustrate how to create your favorite dish. What steps are required to create this dish? Be sure that you take your audience through this process step by step.

  What impact or change the way ministry to which feel called

Evaluate how the ideas of the chapter would make an impact or change the way in which you will do whatever ministry to which you feel called.

  Briefly summarize the folole muliaga scenario

Using the Folole Muliaga case study provided in Bridgeman (2010) and Eweje and Wu (2010), evaluate alternative actions and consequences to Mercury Energy. In your paper, be certain to specifically address the following: • Briefly summarize the Fol..

  Explain why the selected questions are necessary

Assume you are the information system manager at a community clinic in the kingdom of Saudi Arabia that currently is completely paper-based.

  How is technology affecting now a days

How is technology affecting? technology has decreased our level of connectedness to others because it allows us to disconnect

  Define electrolyte imbalance

what could be important to stress about this disease process and what electrolyte imbalance it can lead to

  Pros and cons associated with each waterfall and agile sdlc

Outlines the pros and cons associated with each waterfall and Agile SDLC and when to use each. Creates a graphic summary or chart comparing waterfall.

  Disclosing improvements in human capital

Adapted from John C Dumay and Jiayang Lu, "Disclosing improvements in human capital: comparing results to the rhetoric", Journal of Human Resource Costing & Accounting, Vol.14, No.1, 2010, pp.70-97.

  What is statistically or clinically significant evidence

When reviewing the literature and different types of evidence, there are often gaps in the findings. Are such gaps a help or a hindrance when wanting to create

  Discuss your opinion about parents expectations

After reading the "Mothers' Experiences" article, discuss your opinion about parent's expectations for completing high school. Parents often think.

  Leadership in criminal justice organizations

What is the role of the leadership in criminal justice organizations in the motivation of their team members?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd