BUS5DWR Data Wrangling and R Assignment

Assignment Help Programming Languages
Reference no: EM132544470

BUS5DWR Data Wrangling and R - La Trobe University

The purpose of this assignment is to develop and assess your skills in R programming including summarising, wrangling and plotting data. Using the tidyverse package is recommended but not compulsory. Please read through the entire assignment and understand the submission format and marking rubrics before starting.

Part 1

The spreadsheet titled ‘obesity.xlsx' records the prevalence of obesity among adults in each country in the world. Each sheet is supposed to record the information of each country in each year including the average obesity rate and its 95% confidence interval of female, male and both groups. There were three years in the given dataset (2006, 2011, and 2016). Another sheet called Continent with information about countries in each continent is also given.

You will see that it is far from being ready for analysis and needs to be ‘wrangled'. Additionally, a few errors have been deliberately introduced so these will need to be corrected by applying your R code.

Explain why the data in its current form is not considered to be in ‘tidy' format.

Write a function that takes a year and outputs a dataframe with one or more rows, where each row shows the obesity rate of Male and Female in a country of that given year along with its 95% confidence interval values. The returned dataframe should have 6 columns (Country, Year, Sex, Rate, MinCI, MaxCI).

a) Load the data from the worksheet of the given year into a dataframe.

b) Drop the column ‘Both sexes'.

c) Add a new column named ‘Year' filled all rows with the given year.

d) Use the gather() function to transform the data in the two columns Male and Female to become rows. After this step, your dataframe now should have 4 columns: Country, Year, Sex, Rate.

e) Split the Rate column into 3 columns named Rate, MinCI, MaxCI. Your dataframe should have 6 columns after this step: Country, Year, Sex, Rate, MinCI, MaxCI

f) Check and make necessary changes to make sure the data type of Rate, MinCI and MaxCI is numeric. Print the summary of the dataframe.

g) Find and display rows with any invalid data, e.g. the rate value is not in the range of MinCI and MaxCI, MinCI is larger than MaxCI, etc. If they exist, change the MinCI and MaxCI values in these rows into NA. Print the summary of the dataframe.

h) Return the dataframe

Apply the function to each of the three years in the data to obtain three datasets then combine the rows to form a single dataframe. Print the numbers of rows of the dataframe.

Query the dataframe obtained in 1.3 to print the average obesity rates of Female and Male of each year.

Sort the dataframe obtained in 1.3 and display the country name, year, sex and rate in descending order of obesity rates. Write the result to a csv file.

Load the data from the Continent worksheet into a dataframe, keeping only two columns, "Country or area" and "Continent".

Check if there is any country in the dataframe obtained in (1.3) not in the country list loaded in (1.6). Display the country names if any (no duplicates).

Display the average obesity rate of Female in Europe and North America.

Part 2

The online hospitality company Airbnb has made publicly available a number of datasets. This part of the assignment makes use of a subset of the Melbourne dataset. The dataset is given in the AirBnBMel6500.tsv file.

It consists of a number of parameters related to properties available for lodging in the Melbourne metropolitan area and can be visualised

Write R code to answer the following.

Load the dataset from the given file into a dataframe. Change the column name to remove spaces. Observe the data and report whether the type of each column is appropriate or not to the data.

How many listings and unique locations in the dataframe?

Keep only the listings that have the last review in 2019 in a dataframe. Remove all the others. Print the number of remained listings and unique locations.
Display the number of listings of the three most popular property type, excluding listings with missing property type.
Remove the country name (Australia) in the location column.
Find the average price of listings in Carlton.
Find the top ten locations that have the highest average price. Display the name and the average price in its descending order.
Display the listing ID and location of listings that its transport description mentions both words university and supermarket with upper or lower case or mixed in any order.
Suppose somebody wants to choose a listing based on the following criteria. Write a function that inputs a listing id and returns a score that is the sum of points as below:

a. Points for price: (200 minus price) but not less than zero

b. (Review score rating minus 100)

c. Points by popularity based on the number of reviews: 100 if at least in the first quartile, 0 if less than the median value, 50 otherwise.
Which listing ID has the highest score according to the above criteria?

Attachment:- Data Wrangling and R.rar

Reference no: EM132544470

Questions Cloud

Would affect the value of the stock today : Corn, Inc., has an odd dividend policy. The company has just paid a dividend of $6 per share and has announced that it will increase the dividend
Calculate the amount of net income that LM Company report : Calculate the amount of net income that LM Company would report in its 2020 income statement after the appropriate adjusting entry is made
Total interest cost over the term of the mortgage : You are considering the purchase of a house. The house costs $300,000. You have no down payment.
Calculate the amount of prepaid insurance shown on jackson : Calculate the amount of prepaid insurance shown on Jackson Company's December 31, 2020 balance sheet. The information is available related to Jackson Company
BUS5DWR Data Wrangling and R Assignment : BUS5DWR Data Wrangling and R Assignment Help and Solution, La Trobe University - Assessment Writing Service - develop and assess your skills in R programming
Apply the payback criterion : Consider the following two mutually exclusive projects, you require a 15 percent return on your investment:
What is the net operating profit after taxes in year 2 : Apple is planning to launch a new easy-to-use kitchen appliance with a touchscreen interface, the iToaster. Apple expects to sell 1 million and 2 million
Calculate the total amount of interest expense reported : Calculate the total amount of interest expense reported by Jason Company in its 2020 income statement related to these two loans.
How would you respond to situation : Suppose that you observe that the current spot exchange rate is S=1.26$/€, and that an American call on the Euro with K=1.20$/€, 3 months till expiration

Reviews

Write a Review

Programming Languages Questions & Answers

  Write a prolog program to solve the sudoku puzzle

Write a Prolog program to solve the 6 by 6 Sudoku puzzle distributed in class. Do not use a solution downloaded from the Internet or elsewhere. Write your own. It should be designed along these lines: Label the squares X1, X2, ..., X36 as discuss..

  Write procedure list-records whihc returns list of records

Write a procedure called list-records which returns a list of all records by artist from a list of records containing several artists.

  Calculate the power of any number a raised to the exponent n

Power.java which contain the details of creating the Power class and the method power which should calculate the power of any number a raised to the exponent n.

  Compare the gene prediction with the official annotation

Compare the gene prediction with the official annotation on Genbank - Count of genes in the reference annotation - Populate these within an HTML5-compliant

  Find the negation of each of statements

Use de Morgans law to find the negation of each of the subsequent statements - kwame will take a job in industry or go to graduate school.

  ITECH2001 Game Development Fundamentals - Game Prototype

ITECH2001 Game Development Fundamentals Assignment - Game Prototype, Federation University, Australia. Task - Need car racing game

  Write a program a to program to compute the area

Write a program a to program to compute the area A of a triangle with base b and height h. Write a program to a program to compute the volume V of a sphere of radius r.

  Create a form using the form tool

Courtyard Medical Plaza wants to add a form and a report to its database. To ensure consistency, the starting file is provided for you. Create a blank form and modify the form in Layout view.

  Create while-end repetition structure includes a nested if

Create your own While-End (or For-End) repetition structure that includes a nested if-then selection structure. You decide the theme. You should provide both the pseudocode and the flowchart of your example.

  Create a program to draw image of archery target

Assume that you have been hired to produce a program which draws the image of the archery target-or, if you prefer commercial applications, a logo for national department store.

  Write a program that calculates a cars gas mileage

Write a program that calculates a car's gas mileage - The program should ask the user to enter the number of gallons of gas the car hold and the number of miles it can be driven on a full tank.

  Displays the total amount of money entered into slot machine

The program will ask whether the user wants to play again. If so, these steps are repeated. If not, the program displays the total amount of money entered into the slot machine and the total amount the user won.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd