Plot of the relationship between journey time and cost

Assignment Help Database Management System
Reference no: EM131757175

Data Cleaning

Problem 1 :
The dataset is missing a lot of data, suggest an explanation for

(a) missing payment card numbers

(b) prices 1 and

(c) end addresses

Finally, (d) calculate what percentage of prices are missing and suggest a way to deal with both missing prices and other missing data before using the dataset for analysis

Whatever you decide to do in (d), apply it to the given dataset - including sorting out not missing but inconsistent and missing data data - and save it as stage1.csv in your final submission

Descriptive Statistics
For this problem include in your latex file itself which R-code you used for answering each part of the question.

Problem 2:

a) What is the average journey cost

b) Which journey (between which addresses) is the most popular and what is the median time for this journey?

c) What is the average and median duration of the journeys? Hint: You may find it helpful to create and addition column in excel to hold the calculated journey times, and read the resultant file into R, before doing the analysis.

Modelling

For this problem describe in your latex file itself which R-code you used for answering each part of the question.

Problem 3:

The prices are not missing due to the corresponding transaction being paid in cash 1

a) Create a plot of the relationship between journey time and cost

b) Is there a linear relationship between these variables? Show your reasoning for this answer, mentioning the type of model you use to answer this question

c) Can you suggest a rough set of categories by which journeys can be clustered? Suggest the model that you can use to find this out, and the values representative of the clusters 2. Furthermore, explain what these clusters can be understood to represent conceptually with respect to the journeys

d) Run the appropriate validation test for your clustering model, and explain how this affects your certainty about your categories

Question Refinement and Hypothesis Testing

For this problem describe in your latex file itself which R-code you used for answering each part of the question.

Problem 4 :

As a data scientist hired by Uber you have been asked to simply figure out ways to reduce costs. However, you only have the attached customer data as input. Uber has said that this customer's behaviour is representative of a important sector of the market in the Al Naseem area.

Your task is to figure out if the question of 'how costs can be reduced', can be answered by the given data. Your initial consultation with someone in finance reveals that troublesome customers, defined as undecisive customers that keep cancelling their ubers after ordering them without the 5 minutes elapsing, are an increasing cost.

Your further discussion with the product engineering manager shows that there is an idea for creating a private rating of uber users based on this troublesome behaviour: users who cancel a large percentage of their trips will be given low ratings. And users with low ratings will not be 'actually assigned ubers' (even though the application may show otherwise) until a few minutes after they have ordered the uber.

a) Explain briefly how costs can be reduced with such a rating system

b) Suggest a refined question about saving costs, and what you expect to benefit from answering this question

c) What would be a way to answer this question with the given data

d) Suggest a hypothesis test, stating the null and alternate hypothesis. Assume here that if the user cancels 30 percent or more of their rides then they will get low ratings

e) Perform the test on the attached dataset, are you inclined to accept the null or alternate hypothesis explain your choice

f) Given the user data you analysed is representative of 1000 users, and assuming that cancellations within 5 minutes cost on average 3 SAR, how much money do you think you can save and over how many months?

Presentation

Problem 5:

Communicate your problem, question, refined question, statistical test results and overall conclusions from Problem 4 to your manager using the necessary visualisations. You should use your results from the prior problems to inspire or encourage your final argument 3

Attachment:- Data.rar

Verified Expert

This task provides a clear working on R codes. Communicate your problem, question, re?ned question, statistical test results and overall conclusions from Problem 4 to your manager using the necessary visualisations. You should use your results from the prior problems to inspire or encourage your ?nal argument 3

Reference no: EM131757175

Questions Cloud

Distribution of sat scores is symmetric and single-peaked : SAT versus ACT Eleanor scores 680 on the SAT Mathematics test. The distribution of SAT scores is symmetric and single-peaked, with mean 500 and standard.
Demonstrate a corrupt or bloated bureaucracy : Can you provide an example in recent time that might demonstrate a corrupt or bloated bureaucracy?
Explain the role students have in the classroom : Generate a set of rules and expectations. Examine and explain the role students have in the classroom. Examine and explain the role you play in the classroom.
Implementing a lean process improvement : 1) What do you think are the cost factors to consider when implementing a lean process improvement?
Plot of the relationship between journey time and cost : calculate what percentage of prices are missing and suggest a way to deal with both missing prices and other missing data before using the dataset for analysis
Numbers of words defined on randomly selected pages : The numbers of words defined on randomly selected pages from a dictionary are shown below. Find the? mean, median, and mode of the listed numbers.
What is the typical mercury concentration in cans of tuna : What is the typical mercury concentration in cans of tuna sold in stores? A study conducted by Defenders of Wildlife set out to answer this question.
Essential step in becoming an authentic leader : 1. Learning about one's self is an essential step in becoming an authentic leader. What role did self-awareness play in Leadership?
Prepare the cognitive segments of your five-day curriculum : Prepare the cognitive segments of your 5-day curriculum. Using the guidelines above develop 5 objectives (1 per day).

Reviews

Write a Review

Database Management System Questions & Answers

  Knowledge and data warehousing

Design a dimensional model for analysing Purchases for Adventure Works Cycles and implement it as cubes using SQL Server Analysis Services. The AdventureWorks OLTP sample database is the data source for you BI analysis.

  Design a database schema

Design a Database schema

  Entity-relationship diagram

Create an entity-relationship diagram and design accompanying table layout using sound relational modeling practices and concepts.

  Implement a database of courses and students for a school

Implement a database of courses and students for a school.

  Prepare the e-r diagram for the movie database

Energy in the home, personal energy use and home energy efficiency and Efficient use of ‘waste' heat and renewable heat sources

  Design relation schemas for the entire database

Design relation schemas for the entire database.

  Prepare the relational schema for database

Prepare the relational schema for database

  Data modeling and normalization

Data Modeling and Normalization

  Use cases perform a requirements analysis for the case study

Use Cases Perform a requirements analysis for the Case Study

  Knowledge and data warehousing

Knowledge and Data Warehousing

  Stack and queue data structure

Identify and explain the differences between a stack and a queue data structure

  Practice on topic of normalization

Practice on topic of Normalization

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd