Train a model to predict traffic speed

Assignment Help Other Subject
Reference no: EM133170154

Scenario: You're a data scientist at Uber -- sitting in a war room on March 16, 2020, 1 day after California-wide COVID lockdown measures began and the day shelter-in-place measures are announced in the bay area. The entire data science department is on fire: All of your existing traffic models have regressed significantly. Given the sudden change in traffic patterns (i.e., no traffic at all), the company's traffic estimates are wildly incorrect.

This is a top priority for the company. Since traffic estimates are used directly for pricing strategies, this is actively costing the company millions every hour. You are tasked with fixing these models. Takeaways: How do you "fix" models that have learned biases from pre-lockdown traffic? How do you train new ones, with just 24 hours of data? What sorts of data do you examine, to better understand the situation? In the midst of company-wide panic, you'll need a strong inferential acumen to lead a robust data science response. In this project, we'll walk you through a simulated war room data science effort, culminating in some strategies to fix models online, which are experiencing large distributional shifts in data. For this project, we'll explore traffic data provided by the Uber Movement dataset, specifically around the start of COVID shutdowns in March 2020. Your project is structured around the following ideas:

1. Guided data cleaning: Clustering data spatially

a. Load Uber traffic speeds dataset

b. Map traffic speeds to Google Plus Codes (spatially uniform) i. Load node-to-gps-coordinates data ii. Map traffic speed to GPS coordinates iii. Convert GPS coordinates to plus code regions iv. Sanity check number of plus code regions in San Francisco v. Plot a histogram of the standard deviation in speed, per plus code region. c. Map traffic speeds to census tracts (spatially non-uniform) i. Download census tracts geojson ii. Map traffic speed to census tracts iii. Sanity check number of census tracts in San Francisco with data. iv. Plot a histogram of the standard deviation in speed, per census tract. d. What defines a "good" or "bad" spatial clustering?

2. Guided EDA: Understanding COVID lockdown impact on traffic a. How did lockdown affect average traffic speeds? i. Sort census tracts by average speed, pre-lockdown. ii. Sort census tracts by average speed, post-lockdown. iii. Sort census tracts by change in average speed, from pre to post lockdown. iv. Quantify the impact of lockdown on average speeds. v. Quantify the impact of pre-lockdown average speed on change in speed. b. What traffic areas were impacted by lockdown? i. Visualize heatmap of average traffic speed per census tract, pre-lockdown. ii. Visualize change in average daily speeds pre vs. post lockdown. iii. Quantify the impact of lockdown on daily speeds, spatially.

3. Open-Ended EDA: Understanding lockdown impact on traffic times a. Download Uber Movement (Travel Times) dataset

4. Guided Modeling: Predict traffic speed post-lockdown a. Predict daily traffic speed on pre-lockdown data i. Assemble dataset to predict daily traffic speed. ii. Train and evaluate linear model on pre-lockdown data. b. Understand failures on post-lockdown data i. Evaluate on post-lockdown data ii. Report model performance temporally c. "Fix" model on post-lockdown data i. Learn delta off of a moving bias ii. Does it "solve itself"? Does the pre-lockdown model predict, after the change point? iii. Naively retrain model with post-lockdown data iv. What if you just ignore the change point? 5. Open-Ended Modeling: Predicting travel times post-lockdown

This is the final assignment for the graduate class on Data Science.
The assignment is in two parts. The first part is from questions 1 to 2: in the jupuyter notebook. The second part is from question 3 to 5 jupyter notebook.

Please refer to the fa21_discussions_dev.pdf - Report Format and Submission for the detail on the assignment. This will describe what is the expectation from the final report. Please note that there are three types of assignment AQI, COVID and Traffic - my dataset is "Traffic".

Dataset: Traffic dataset
This is the assigned data for us. It contains information regarding the traffic speed date-wise for SanFrancisco. The objective was to understand the impact of covid on traffic speed.

Part 1 of the assignment:

In the first part of the assignment, we first developed the data - created geodataframe, performed s join and finally conducted some EDA to understand if the traffic speed in SanFranacisco were impacted by COVID - before and after lockdown.

Furthermore, based on the preliminary EDA, we submitted a hypothesis to test (design doc traffic) in the second part of the assignment.

Hypothesis: Even though the number of vehicles on the road decreased post covid lockdown, speeds did not change drastically in specific locations because there are other confounding variables that affect traffic speed at any given location such as traffic light density.

Approach for the Part 2:
In the second part of the assignment, we need to understand if the covid lockdown impacted the vehicle speed or other confounding factors as traffic. The idea is that in an area with large number of traffic lights would have less speed. And this would stay constant post-COVID as well.

Hence we downloaded the dataset for traffic lights, and spotlights, speed limit (PFA) in the city of San Franciso.
We are also given the dataset of Daily travel times - this has the data on mean travel speed - datewise, for the specific geography.
The geography can be traced through the geometry column and/or corresponding movement Id column.

Now we need to merge the data of traffic light, and the previous (part 1) dataset of traffic speeds_to_tract and times_to_tract to understand if the speed has changed before and after covid.

Question 1:
It is an open-ended EDA. We can basically run EDA to test:
- If there is a change in traffic speed before and after covid
- If the change in traffic speed is related high number of traffic signals/stop signs/ speed limit
- identify if the change in the speed is correlated to traffic signal/stop sign/speed limit/ covid lockdown

Question 2:
This is guided modelling: In this step, you'll train a model to predict traffic speed.
In this question, we are given prompts, we just need to write codes as directed.

Question 3:
Open-Ended Modeling: Predicting travel time post-lockdown.

a: Train a baseline model of your choice using any supervised learning approach we have studied; you are not limited to a linear model.

b: Improve on your baseline model. Specify the model you designed and its input features. Justify why you chose these features and their relevance to your model's predictions.

In this, we will have to develop, train and improve the model.

Attachment:- discussions_dev.rar

Reference no: EM133170154

Questions Cloud

Demonstrate dependability-initiative-time-management : Define the scope of the research you would like to undertake. ( I am going in the Headstone and Monument business).
Identify an environmental plan : Once you have identified your examples, take 1 of your examples and effectively identify an environmental plan and how to solve the issue.
Prepare journal entries for the preceding errors : At the beginning of 2019, the book value of the machinery was $106,200. Prepare journal entries for the preceding errors
Disadvantages of the starbucks company : What are the disadvantages of the Starbucks company? As the some consumer's opinions? Also, What is the missing situation in the company? I'd be so glad if you
Train a model to predict traffic speed : Identify if the change in the speed is correlated to traffic signal/stop sign/speed limit/ covid lockdown - Train a baseline model of your choice
Idea of upgrading the technology : You were recently hired to be the assistant manager at True Springs Bakery, a provider of 100 percent organic baked goods. You oversee making the schedule, ord
Compute the value-added ratio for both old and new process : In a effort to improve the lead time, the company has tried reducing the batch size to 10 units. Compute the value-added ratio for both the old and new process
Transmission that uses a non-return to zero : Assume a 100 Mbps transmission that uses a Non-return to Zero (NRZ) bipolar signaling technique, with a +5 voltage indicating a 1, and a -5 voltage indicating a
Populate the depreciation tables : The total number of units produced by the end of Year 4 exceeds the original estimate - the difference was not predicted - Populate the depreciation tables


Write a Review

Other Subject Questions & Answers

  What credentials do they have

You are required to write a 2-3 paragraph explanation and/or definition of what the term or phrase means in a healthcare context.

  Define what is sampling theory

What is sampling theory. Describe it and provide examples to illustrate your definition. Discuss generalize ability as it applies to nursing research

  Define regulatory requirements on human resource process

Examination of the effect of legal, safety, and regulatory requirements on human resource process.

  Discuss the economic and global financial crisis

Depending on the number of scholarly sources used, your diversity perspectives paper should be approximately

  Steps to successful breast feeding

A maternity facility can be designated "baby friendly" when it has implemented the specified ten steps to successful breast feeding.

  How has thinking pink influenced the experience of women

How has "thinking pink" influenced the experience of women and men with other types of cancer? Also explored this week will be the balancing act

  Describe psychological and behavioral factors of terrorism

Describe the psychological and behavioral factors of terrorism. Categorize means of communication and influence they have upon advancing terrorist narratives. Summarize psychological and behavioral factors within al Qaeda leadership and affiliate gro..

  Major agents of socialization

Give at least one example of how each of the major agents of socialization (family, school, peer groups, and media) shaped the person you are today, particularly in regard to your statuses and roles.

  Impartial arbiters in resolving disputes

The Canadian judicial system has been criticized for failing to reflect the ethnic, gender, and class composition of the Canadian public.

  What challenges does pose for environmental justice

Which of Hashimoto's films did you find most effective? Why did this film have a greater impact than the other two What did you feel while watching this film?

  Prejudice and discrimination among students of different

Imagine that NYC government hired you, a social psychologist, to come up with a plan to reduce prejudice and discrimination among students of different races

  What are your thoughts on the harvesting of organisms

What are your thoughts on the harvesting of organisms such as whales, dolphins, seals, etc.?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd