BUS5DWR Data Wrangling and R Assignment

Assignment Help Other Subject
Reference no: EM132575317

BUS5DWR Data Wrangling and R Assignment - La Trobe University, Australia

The purpose of this assignment is to develop and assess your skills in R programming including summarising, wrangling and plotting data. Using the tidyverse package is recommended but not compulsory. Please read through the entire assignment and understand the submission format and marking rubrics before starting.

Part 1 -

The spreadsheet titled 'obesity.xlsx' records the prevalence of obesity among adults in each country in the world. Each sheet is supposed to record the information of each country in each year including the average obesity rate and its 95% confidence interval of female, male and both groups. There were three years in the given dataset (2006, 2011, and 2016). Another sheet called Continent with information about countries in each continent is also given.

You will see that it is far from being ready for analysis and needs to be 'wrangled'. Additionally, a few errors have been deliberately introduced so these will need to be corrected by applying your R code.

1.1. Explain why the data in its current form is not considered to be in 'tidy' format.

1.2. Write a function that takes a year and outputs a dataframe with one or more rows, where each row shows the obesity rate of Male and Female in a country of that given year along with its 95% confidence interval values. The returned dataframe should have 6 columns (Country, Year, Sex, Rate, MinCI, MaxCI).

a) Load the data from the worksheet of the given year into a dataframe.

b) Drop the column 'Both sexes'.

c) Add a new column named 'Year' filled all rows with the given year.

d) Use the gather() function to transform the data in the two columns Male and Female to become rows. After this step, your dataframe now should have 4 columns: Country, Year, Sex, Rate.

e) Split the Rate column into 3 columns named Rate, MinCI, MaxCI. Your dataframe should have 6 columns after this step: Country, Year, Sex, Rate, MinCI, MaxCI.

f) Check and make necessary changes to make sure the data type of Rate, MinCI and MaxCI is numeric. Print the summary of the dataframe.

g) Find and display rows with any invalid data, e.g. the rate value is not in the range of MinCI and MaxCI, MinCI is larger than MaxCI, etc. If they exist, change the MinCI and MaxCI values in these rows into NA. Print the summary of the dataframe.

h) Return the dataframe.

1.3. Apply the function to each of the three years in the data to obtain three datasets then combine the rows to form a single dataframe. Print the numbers of rows of the dataframe.

1.4. Query the dataframe obtained in 1.3 to print the average obesity rates of Female and Male of each year.

1.5. Sort the dataframe obtained in 1.3 and display the country name, year, sex and rate in descending order of obesity rates. Write the result to a csv file.

1.6. Load the data from the Continent worksheet into a dataframe, keeping only two columns, "Country or area" and "Continent".

1.7. Check if there is any country in the dataframe obtained in (1.3) not in the country list loaded in (1.6). Display the country names if any (no duplicates).

1.8. Display the average obesity rate of Female in Europe and North America.

Part 2 -

The online hospitality company Airbnb has made publicly available a number of datasets. This part of the assignment makes use of a subset of the Melbourne dataset. The dataset is given in the AirBnBMel6500.tsv file.

It consists of a number of parameters related to properties available for lodging in the Melbourne metropolitan area and can be visualised.

Write R code to answer the following.

2.1. Load the dataset from the given file into a dataframe. Change the column name to remove spaces. Observe the data and report whether the type of each column is appropriate or not to the data.

2.2. How many listings and unique locations in the dataframe?

2.3. Keep only the listings that have the last review in 2019 in a dataframe. Remove all the others. Print the number of remained listings and unique locations.

2.4. Display the number of listings of the three most popular property type, excluding listings with missing property type.

2.5. Remove the country name (Australia) in the location column.

2.6. Find the average price of listings in Carlton.

2.7. Find the top ten locations that have the highest average price. Display the name and the average price in its descending order.

2.8. Display the listing ID and location of listings that its transport description mentions both words university and supermarket with upper or lower case or mixed in any order.

2.9. Suppose somebody wants to choose a listing based on the following criteria. Write a function that inputs a listing id and returns a score that is the sum of points as below:

a. Points for price: (200 minus price) but not less than zero

b. (Review score rating minus 100)

c. Points by popularity based on the number of reviews: 100 if at least in the first quartile, 0 if less than the median value, 50 otherwise.

2.10. Which listing ID has the highest score according to the above criteria?

Reference no: EM132575317

Questions Cloud

The internet change consumer and supplier relationships : It has been said that there is no such thing as a sustainable strategic advantage. How does the Internet change consumer and supplier relationships?
Identify revenue streams for the group proposed solution : Identify revenue streams for the group's proposed solution, including the unit and volume being sold, and the price each unit will sell for.
Business operations data set : Select a business operations data set from the internet or other sources which can be used for forecasting.
Alliance supermarket : Alliance Supermarkets has been using a point-of-sale system for some time to track its inventory. Describe any information that may help Alliance reduce cost
BUS5DWR Data Wrangling and R Assignment : BUS5DWR Data Wrangling and R Assignment Help and Solution - La Trobe University, Australia - Assessment Writing Service
Which is struggling in your community : Choose a business in the media having difficulties with its operating model or one which is struggling in your community.
Explain how you would manage client diverse needs : Explain how you would manage client's diverse needs, including his co-occurring disorders. Describe a treatment plan for client, including how you would.
Define and explain biofeedback in your own words : Define and explain biofeedback in your own words. Describe who uses biofeedback and EEG biofeedback and why (detail at least two professions and purposes).
Has been enough or should the fed do more : What is your take on it? Has it been enough or should the Fed do more? provide a summary of the recent response of the Federal to the Coronavirus Crisis

Reviews

Write a Review

Other Subject Questions & Answers

  What current issue in american politics

What current issue in American politics would you like to see political protests focus on, and what you think would be the most effective strategy

  Difference between significant and non-significant results

Explain the difference between significant and non-significant results to your friend in "layperson's terms"

  Describe what you have learned from specific courses

Describe what you have learned from at least three specific courses (e.g., philosophy, history, English, math, psychology, etc.) that has proved its usefulness.

  What is watch phenomenon

Bertrand Russell argues that if the difference between right and wrong is determined by God's will, then it no longer makes sense to say that God is good.

  Discuss the importance of developing a data migration plan

Discuss the importance of developing a data migration plan and what may result if data migration is not considered in a project plan. This link provides a template of an EHR project plan from the American Medical Association. Use it to guide your ..

  The impact of globalization and economic development

Do you believe the impact of globalization and economic development on the status of women in non-Western nations has been largely positive or negative?

  Relationship to ana scope and standards of practice

Relationship to ANA Scope and Standards of Practice

  Fundamentals of belief perseverance

Belief perseverance can be described as being the phenomenon in which individuals or groups believe in their theories despite the existence of contrary evidence (Bonabeau, 2009).

  Explain to whom the instrument would be generalized

Create an operational definition of your construct using at least three peer-reviewed journal articles as references. Select and list five items used to sample the domain. Select the method of scaling appropriate for the domain.

  How reliable are the statistics in the ucr

How can government officials use the crime statistics data from the 1970s to predict crime twenty years later? How reliable are the statistics in the UCR? Are law enforcement agencies required to report their criminal incidents for compilation in ..

  Government cost estimate and the statement of work

Imagine that you are working in a federal government office as a government employee. You have been tasked with preparing an Independent Government Cost Estimate (IGCE)

  Describe the patient rights in brief

From the list below, choose one topic and identify two specific legal obligations that a specific health service organization has to its patients.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd