Write a function that takes in a facility ID

Assignment Help Programming Languages
Reference no: EM132393647

BUS5DWR Data Wrangling and R Assignment - La Trobe University, Australia

The purpose of this assignment is to develop and assess your skills in R programming including summarising, wrangling and plotting data. Using the tidyverse package is recommended but not compulsory. Please read through the entire assignment and understand the submission format and marking rubrics before starting.

Part 1 -

The spreadsheet titled 'sports_and_recreation.xlsx' details the location, types of sport played as well as the condition, age and other details about sports and recreational facilities in Victoria. You will see that it is far from being ready for analysis and needs to be 'wrangled'. Additionally a few errors have been deliberately introduced so these will need to be corrected by initial analysis.

1. Explain why the data in its current form is not considered to be in 'tidy' format.

2. Write R code to read in the data, manipulate it and output the result to a single csv file having the following header row. Each row should provide the details for a single sport within a sports and recreation facility (ignore cases where the sport is unspecified).

facility_ID,facility_name,street_no,street_name,street_type,suburb_town,postcode,LGA,latitude,long itude,sports_played,number_field_courts,field_surface_type,facility_age,facility_condition,facility_up grade_age

Your code will have the following sections (not necessarily in the order given and the process may be iterative as you find more things to do). Please include comments in the code to separate each segment and explain your steps.

a) Read the data into a dataframe (or tibble).

b) Observe the layout of the data and describe any issues you encounter in terms of missing or duplicated data. Make the necessary modification of rows for consistency.

c) Write a function that takes in a facility ID and outputs a dataframe with one or more rows, where each row is specific to the details of a single sport within a facility.

d) Apply the function to each facility name and then combine the rows to form a single dataframe.

e) Split the (latitude, longitude) attribute into two separate columns.

f) Split the address attribute into its components as listed above (note that not all addresses are complete so there may be more than one case to consider).

i. Note that the street number may have two words such as "Lot 1A" but a street number is assumed to have at least one digit character.

ii. Note that the street type may have two words such as "Street North".

iii. Assume the suburb/town attribute is the part of the address in capitals before the postcode.

g) Include the above header row for the dataframe.

h) Do a summary of the dataframe to look for unusual values, then correct them until satisfied with the result.

i) Sort the dataframe by facility_ID, then sports_played.

j) Write the result to a csv file.

3. How many facilities offer softball with at least 5 fields?

Part 2 -

The online hospitality company Airbnb has made publicly available a number of datasets. This part of the assignment makes use of the listings.csv dataset. It consists of a number of parameters related to properties available for lodging in the Melbourne metropolitan area.

Write R code to answer the following.

1. Find the average price of a listing by room type.

2. How many listings contain the following combinations of words (upper or lower case or mixed, either order) in the name column?

a. Spacious, cosy

b. Vibrant, bright

3. How many listings had last review date in the past 14 days up to and including January 6 2018?

4. How many host names (not host ids) are mentioned between 10 and 20 times (inclusive) in the listings?

5. List the five top neighbourhoods sorted in decreasing order by average number of reviews. In a third column show the average value of availability_365 for that neighbourhood.

6. Write a function that has listing id as input and calculates the (direct) distance in kilometres between the location of the listing and the Melbourne CBD with latitude/longitude coordinates (-37.8136o, 144.9631o). Look up and use the haversine formula to perform this using an earth radius of 6370km (you may use an R package). Hence determine the number of listings that are between 5 and 10km from the Melbourne CBD.

7. Suppose somebody wants to choose a listing based on the following criteria. Write one or more functions that inputs a listing id and calculates a score that is the sum of points as below:

a. Points for the location: 50 × (10 minus haversine formula distance in kilometres to the nearest table tennis facility in km) but not less than zero (this uses the sports_played and location attributes in the cleaned dataset from Part 1).

b. Points for price: (300 minus price) but not less than zero.

c. Points for the room type: 100 for Entire home/apt, 50 for Private room, 0 for Shared room.

d. Points for availability: (availability_365) divided by 5.

e. Points by popularity based on the number of reviews: 100 if at least in the top quartile, 0 if less than the median value, 50 otherwise.

Which id has the highest score according to the above system?

Part 3 -

Write a short report describing the two (processed) datasets from parts 1 and 2 through tables and plots with R including the following:

  • A summary of the variables using the summarytools package. Include at least two observations you feel are worth noting.
  • A histogram showing the distribution of the distance of listings from the Melbourne CBD (using the result from Part 2). Use a bin width of 1km.
  • A single map combining a subset of the data from each of the two datasets (e.g. it could be in a single LGA). You will need to use an R package that can map geospatial data.
  • Choose one sport and do a visualisation of your choice including one or more of the following variables from the dataset of part 2.

NumberFieldCourts FieldSurfaceType FacilityAge FacilityCondition FacilityUpgradeAge

Point out any interesting patterns (e.g. trends) you see from your plots.

Attachment:- Assignment Files.rar

Reference no: EM132393647

Questions Cloud

How can leaders employ the focusing illusion : What aspects of the job can employees focus on besides pay and benefits?
Which models have you seen implemented in the workforce : Change management models: Compare and contrast the various change management models. Provide a brief overview of your comparison. Which models have you seen.
Business management for a persuasive essay : What would be a good topic to cover in business management for a persuasive essay?
Describe insights you gained from the experience : Describe insights you gained from the experience and explain how these insights affect your approach to practice. Describe the challenges you experienced.
Write a function that takes in a facility ID : Write a function that takes in a facility ID and outputs a dataframe with one or more rows, where each row is specific to the details of a single sport
How does mobility differ in organizations : Traditional career paths strictly emphasize upward mobility within an organization. How does mobility differ in organizations with innovative career paths?
Explain why employment legislation exists : Explain why employment legislation exists. How might employers play a key role in preventing the need for additional employment legislation? Describe in detail.
Explain the influence of governance and ethics : Weeks 3, 4 and 5 Individual Assignments are integrated to generate a Strategic Management Plan. This is part three of the three part Strategic Management Plan.
How should we prevent mass shootings in the workplace : How should we prevent mass shootings in the workplace? Choose your own topic - email me for approval of an issue not on this list. Narrow down your issue focus.

Reviews

Write a Review

Programming Languages Questions & Answers

  Write a haskell program to calculates a balanced partition

Write a program in Haskell which calculates a balanced partition of N items where each item has a value between 0 and K such that the difference b/w the sum of the values of first partition,

  Create an application to run in the amazon ec2 service

In this project you will create an application to run in the Amazon EC2 service and you will also create a client that can run on local machine and access your application.

  Explain the process to develop a web page locally

Explain the process to develop a Web page locally

  Write functions

These 14 questions covers java class, Array, link list , generic class.

  Programming assignment

If the user wants to read the input from a file, then the output will also go into a different file . If the user wants to read the input interactively, then the output will go to the screen .

  Write a prolog program using swi proglog

Write a Prolog program using swi proglog

  Create a custom application using eclipse

Create a custom Application Using Eclipse Android Development

  Create a application using the mvc architecture

create a application using the MVC architecture. No scripting elements are allowed in JSP pages.

  Develops bespoke solutions for the rubber industry

Develops bespoke solutions for the rubber industry

  Design a program that models the worms behavior

Design a program that models the worm's behavior.

  Writing a class

Build a class for a type called Fraction

  Design a program that assigns seats on an airplane

Write a program that allows an instructor to keep a grade book and also design and implement a program that assigns seats on an airplane.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd