How to balance a dataset in spark

Assignment Help Data Structure & Algorithms
Reference no: EM133189666

Question: The COVID19 pandemic has been devastating for hospitals as limited resources can be stretched. One area of work that is being investigated is the use of simulators that can determine the number of active cases likely to happen at a hospital This simulator uses machine learning algorithms to predict number of patients that may possibly enter into a hospital in the process helping hospitals predict their resource needs. In this assignment, you will try to run a machine learning algorithm in Spark that predicts fatalities.

Using the above dataset, write a Spark machine learning algorithm in order to predict the fatality rates in the Toronto area.

Note that since majority of COVID-19 cases result in recovery, this dataset is not balanced. For example, if you have an algorithm that simply makes all cases as "resolved", then it would be 99% accurate (since 99% of the cases are "resolved") even though it did not predict a single fatality correctly! As a result you cannot use the dataset as is and must balance the dataset. We did not go over the concept of balancing in the Spark Machine Learning lessons but you should have been exposed to this concept in other courses. You will therefore need to investigate how to balance a dataset in Spark.

Once you have a balance dataset, you can run your algorithm on the balanced dataset and report your accuracy.

1. Your machine learning algorithm code in Spark (as a simple text file)

Reference no: EM133189666

Questions Cloud

How much time nationally people spend eating and drinking : A nutritionist wants to determine how much time nationally people spend eating and drinking. Suppose for a random sample of 963 people age 15 or older
Why technical analysis is considered as useless : Why Technical Analysis is considered as useless by the Efficient Market Theory
Discuss the main challenges facing employment relations : Discuss the main challenges facing employment relations in Australia. In your assessment you must discuss the challenges that are currently being faced by union
What is the process of organizational flattening : What is the process of organizational flattening in the healthcare organization, Describe the role of human resources in the healthcare organization
How to balance a dataset in spark : You can run your algorithm on the balanced dataset and report your accuracy - Investigate how to balance a dataset in Spark
How will you build a renowned trail blazer brand : How will you build a renowned Trail Blazer brand, How will you deal with your complaining customers
Determine the estimated proportion from the sample : On May 23, 2013, Gallup reported that of the 1,005 people surveyed, Determine the estimated proportion from the sample
What is facility management : What is Facility Management, What do Facility Managers do
Identify and summarise the main points of arguments : Produce a well-structured, logical, coherent and cohesive response to writing tasks using appropriate academic language structures)

Reviews

Write a Review

Data Structure & Algorithms Questions & Answers

  Devise ef?cient algorithm for computing probability

Given the probabilities r1, · · · , rn, the costs c1, · · · , cn, and the budget B, ?nd the redundancies m1, · · · , mn that are within the available budget and that maximize the probability that the system works correctly. Devise an ef?cient algo..

  Written to model a railroad switching yard

CSIS- 210 - Data Structures. A program is to be written to model a railroad switching yard. One part of the switching network consists of a main track and a siding that contains four exits onto which cars may be shunted and removed later

  Evaluate the reliability of the data mining algorithms

the development of complex algorithms that can mine mounds of data that have been collected from people and digital

  Show the various splitting merging stages of binary merge

Use diagrams like those in the text to show the various splitting merging stages of binary merge sort for the following lists of numbers.

  Question 1you are required to undertake a detailed analysis

question 1you are required to undertake a detailed analysis of the avl tree sorting algorithm for avlsort.to do this

  Writing a java program

The history teacher at your school requires help grading a True or False test. The students' IDs and test answers are stored in a file document.

  Create a program that keeps track of specific information

Create a program that keeps track of specific information for Students. The information stored should be the following: First Name, Last Name, Major, GPA, UIN, NetID, Age and Gender.

  What is apriorialgorithm

What is Association Rule? Discuss with example? What is Apriorialgorithm,discuss its advantages and disadvantages?

  Sketch portion of decision tree via quicksort to sort array

Suppose you are using quicksort to sort array A with 6 distinct elements a1, a2, ..., a6. Sketch portion of the decision tree which corresponds to th ordering a3

  Design and develop a linked list

Your first task in developing the application for tracking contributors is to load a list of the people who are helping the cause. Design and develop a linked list, implemented as a stack, to track all of the contributors

  A and b, both of which perform the same function

Assume you have two algorithms, A and B, both of which perform the same function,

  What is the efficiency of searching for a particular word

Do the same for dictionary whose words are sorted alphabetically. Compare results.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd