Analyze the performance of u.s. airliners

Assignment Help Programming Languages
Reference no: EM131792801

Assignment - Big Data Management

Domain: Airline Industry

Project: U.S. Airline/AirtrafficAnalysis

Analysis and Data Synopsis:

The data being considered for our project is on U.S. Air-Traffic data of 2016 from U.S. Department of Transportation (DOT), Bureau of Transportation Statistics (BTS) consisting of 16 U.S. air carriers that have at least 1 percent of total domestic scheduled-service passenger revenues, plus two other carriers that report voluntarily. The data cover nonstop scheduled-service flights between points within the United States (including territories).

The analysis is aimed at studying and finding out air-traffic patterns in 2016 across various airports, airlines and sectors over US. The study also aims to analyze the performance of U.S. airliners across various parameters and study the factors affecting the performance as well as patterns in performance across various time periods. The dataset size is around 465 MB and covers around 465k records of data.

Analysis Aimed at Covering:
1. Comprehending/Overall Summarization of the Air-Traffic Dataset

2. U.S. Air-Traffic/Airport Operation Analysis 2016 -
a. Airports with maximum air-traffic flows - Volume of air-traffic
b. Airports serving maximum no. of airlines
c. Airports service pattern across various time periods

3. U.S.Airline Functionality Analysis 2016 -
a. Aircrafts with maximum flight services
b. Airport coverage density
c. Weekday air-travel density analysis

4. Aircraft flight performance analysis -
a. Aircraft delays - causes, volume, frequency
b. Aircraft Cancellation - count, airports, airlines, period, causal effects

5. Diverted Air-lines analysis -
a. Summary
b. Cause and density

Assignment contains four questions and will ask you to get familiar with aspects of Apache Spark. While first three questions require you to get familiar with Spark programming, the last question will ask you to understand an existing code and explain it in simple terms.

Q1. Consider the two data files (users.csv, transactions.csv). Users file has the following fields:
a) UserID
b) EmailID
c) NativeLanguage
d) Location

Transactions file has the following fields:
a) Transaction_ID
b) Product_ID
c) UserID
d) Price
e) Product_Description

By making use of Spark Core (i.e. without using Spark SQL) find out:
a) Count of unique locations where each product is sold.
b) Find out products bought by each user.
c) Total spending done by each user on each product.

You cannot make use of Spark SQL for this.

Q2. For this question, please make use of the attached JSON file (tweets.json). Make use of Spark SQL library to answer the following questions:
a) Save the dataset as a DataFrame, and print the schema.
b) Get all of the tweets made by a user (any user would work. We should be able to replace user names to get tweets by that particular user).
c) Find count of all tweets by each user user.
d) Get a list of all of the people who are mentioned in tweets.
e) Count the number of time each person is mentioned in the entire dataset of tweets.
f) Give top 50 users who are mentioned the most.
g) Get a list of all hashtags mentioned in the dataset.
h) Find how many times each hashtag is mentioned in the dataset.
i) Get a list of all of the people who are located in a particular city (e.g. Paris)
j) Get country wise distribution of users, and find out which country ranks highest in terms of number of tweets, and number of users.
k) Find out number of tweets where a user is from France and mentions Paris in their tweets.

Q3. For this question, you would need to use the concepts learnt in Graph analytics session, and use datasets trip.csv and station.csv. The two files contain bike sharing data provided by SF Bay Area Portal. Trip.csv file contains following fields:
a) tripId
b) Duration
c) StartDate
d) EndDate
e) StartStation
f) StartTerminal
g) EndDate
h) EndStation
i) EndTerminal
j) BikeID
k) SubscriberType
l) ZipCode
Station.csv file contains following fields:

a) stationId
b) Name
c) Lat (Latitude)
d) Long (Longitude)
e) Dockcount
f) Landmark
g) Installation

Using the two files, please perform the following:

a) Import the data and create a graph using GraphFrames (Hint: Your graph will have nodes and edges. Nodes here would be individual stations so id field would be name field in station.csv file. Edges would have src and dst so it would Start Station and End Station fields in trip.csv file respectively. You can make use of other fields as properties of nodes and edges).
b) Find out number of incoming connections and outgoing connections for each node and print the top 10 nodes.
c) Find out which are the most common direct routes that people take and print top 10.
d) From the analysis in b, see which are the stations where people most frequently start their trips but do not come back. (Hint: You might have to think of incoming connections as a ratio of outgoing connections). Print top 10 such stations.
e) Find all such patterns where any station a is connected to station b, b is connected to c, but c is not directly connected to a.
f) Run a PageRank algorithm to figure out which is the most important station in the entire graph.

Q4. Consider the Movie Similarities code and problem that was discussed during the class (Session 4). Please provide a brief write-up on the problem, steps needed to arrive at the solution (recommendation system), and how exactly those steps are implemented in the code. While you are doing so, please also mention what each line of code does (It is not sufficient to mention what each block of code does, you would have to provide explanation for each line).

Verified Expert

The assignment contains 2 document files. One document is the group assignment which is about analysis of US aircraft dataset. The assignment contains all the 5 parts with graphs, analysis and summary. The second document is the solution of spark programming questions, it contains the code as well as the screenshots. Also, 3 code files has been attached.

Reference no: EM131792801

Questions Cloud

What dimensions will require the least amount of fencing : If a 20-foot side of the kennel is used as part of one side of a rectangular yard with 900 square feet, what dimensions will require the least amount of fencing
Describes ethical issues for the management accountant : Describe the role of managerial accounting and the management accountant in a business or organization. Describes ethical issues for the management accountant.
What journal entry is required by job : Nojob issues 5,000 previously unissued shares of common stock in the market for $50 per share. Assume outsiders buy the stock. What journal entry is required
What dimensions will minimize costs : A rectangular field with one side along a river is to be fenced. Suppose that no fence is needed along the river.
Analyze the performance of u.s. airliners : The analysis is aimed at studying and finding out air-traffic patterns in 2016 across various airports, airlines and sectors over US
Find the dimension of each lot that yield the minimum cost : If each lot contains 13,500 square feet, find the dimensions of each lot that yield the minimum cost for the fence.
What does the expression over-the-counter refer to : What does the expression over-the-counter refer to and Which market auction or dealer has a physical location? New York Stock Exchange is what type of market?
What the business benefits of collaboration technologies : What are the main business benefits of the collaboration technologies described in the case? How do these go beyond saving on corporate travel
What are the dimensions that will maximize the area : The outside fence costs $10 per running foot installed, and the dividers cost $20 per running foot installed.

Reviews

inf1792801

2/20/2018 4:39:27 AM

Please send us the code file too. These graphs were not from spark. The expert has to use the dataset with below-mentioned name only for this order code (IAH198) and I request not to use any info for IAH197. Airline_Proj_r.csv data can be extracted from the same source. but I need the code file on which he worked and has to prepare the report.

inf1792801

2/20/2018 4:31:15 AM

In case need be MySQL can be used but the spark is the prefered medium. I need help with the following assignments. in the BDM assignment, the module 3 and 4 needs to work. The HW assignment questions need to be done fully Oki in the group project proposal only part 3 &4 needs to be worked and the new assignment has to be done fully. Kindly let the instructor know the same. Also in the group project proposal, all the steps have to be documented. There is no word count but as crisp as possible documentation is encouraged. On 4th Jan client commented. A?ll the steps including the execution commands has to be documented and the output screens has to be attached? ?the code files has to be submitted along with the documentation?. ?T?he data file has been provided so there is no need to download from the source. Kindly make a note of it and the data will be shared via the drive. Kindly share the drive link. To uploaf my data and the proposal. Remember all the steps needs to be documented and all the output screens has to be attached in the document. All the code files needs to be shared along with the document files.

Write a Review

Programming Languages Questions & Answers

  Develop a domain-specific language within ruby

Develop a domain-specific language within Ruby, such as for manipulating files in a directory system, editing files, generating quizzes, describing graphical or geographical scenes, and adding and viewing entries in a calendar.

  Program read weight of package of breakfast cereal in ounces

Write a program which will read the weight of package of a breakfast cereal in ounces and output weight in metric tons as well as number of boxes

  What will the following program segments display

What will the following program segments display

  Writing a class

Build a class for a type called Fraction

  Return boolean value true if string in array is palindrome

Write the recursive method testPalindrome which returns boolean value true if string stored in array is palindrome and false otherwise. Method must ignore spaces and punctuation in string.

  Program that prompts the user to enter the mass of a person

Write a program that prompts the user to enter the mass of a person in kilograms and outputs the equivalent weight in pounds.

  Write a main program that first reads all available meals

Write a main program that first reads all available meals from a file called menu.txt. Write a function called create_event. This function is be called if a customer of the company wants to book an event.

  Develop a concentration game

Concentration, also known as Shinkei-suijaku, Memory, Pelmanism, Pexeso or simply Pairs, is a card game in which all of the cards are laid face down on a surface and two cards are flipped face up over each turn.

  Wap that prompt for and reads a double value

Write a program that prompt for and reads a double value representing a monetary amount. Then determine the fewest number of each bill and coin needed.

  Write code to directly initialize basic variables to zero

Complete the Additional() Property procedure which uses to retrieve and set _additional Property's value. Write code to directly initialize the _basic and _additional variables to zero within the default constructor.

  Write program in vb dot net for mortgage payment amount

Write down the program in VB.Net (not Web based) by using the loan amount of $200,000 with the interest rate of 5.75% and 30 year term.

  Proc mean data=ex1height

Proc mean data=EX1height;throws what sort of error message

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd