How many records are in each table in the database

Assignment Help Computer Engineering
Reference no: EM133776509

Homework: Analyzing Massive Data from the Airline Industry

In this homework, you will work with data from the airline industry. You must download it and add it to your working directory to write SQL queries against it. You will write some SQL directly. On certain questions, you will need to integrate Python and run SQL queries using pandas that can be returned as a dataframe and analyzed with the tools and techniques from Python. Note that there are multiple tables in the database: flights, airports, planes, weather, airlines.

Using Jupyter notebook, answer each of the following questions. This is an individual homework. You must submit your own work. You will do your work in Jupyter notebook and must show the code you used to get the answer. To complete the homework you must download your Jupyter notebook with code and answers as an html file and upload the notebook in html format. All questions should be answered fully when a text answer is required and all code must be shown to indicate how you arrived at the result.

Your report submission should be formatted using Markdown and code in Jupyter notebook as follows:

The name of the homework as an H1, your name as an H2, and the date as an H3 all in the first cell block. Add a horizontal line/rule after the cell. The questions should then be answered in individual cells indicated by Q1, Q2, Q3...etc. in H4 format and separated from other questions by horizontal lines/rulers.

Note you must write out the SQL query used to generate the answer for all questions unless otherwise specified.

A. How many records are in each table in the database?

B. Did any flights depart and arrive exactly on time?

C. What is the mean arrival delay among flights?

D. How many flights had above average arrival delays?

E. What is the relationship between temperature and wind speed? Do changes in the mean temperature throughout the year (by month) move reflect changes in mean wind speed?

Print the first 5 rows of flights and airlines. Does the flights table contain the full name of the airline? If so, print them. If not, write a query to return the first 10 records in the flights data with the following columns: the origin, destination, carrier, flight number, and full name of the airline for each flight in a single table.

Your supervisor is interested in the airlines that offer the most flights in peak periods. Write a query that returns the number of flights each airline operates in December. The results should be a single table that contains five columns: carrier, month, name, and num_flights. The results should be limited to 10 and shown in descending order by number of flights.

Use pandas to connect to the database. Define a function that takes one argument--month_num--called num_flights_fun that uses pandas' `.read_sql_query()` and runs a query that returns the number of flights each airline operates in whatever month is specified in the function call. The result should be returned as a pandas dataframe that contains five columns: carrier, month, name, and num_flights.

Test your function to ensure it works. Create a new dataframe called sept that contains the results with only records from September and a data frame called jan that contains the results with only records from January. The only change you should make to get the results is adjusting the value of the input parameter you are passing in the function call in num_flights_fun. Print out the last five rows of sept and jan.

Using sept and jan, print the top five airlines with the greatest number of flights in September and in January, in descending order. Are the top five airlines the same in September and January? If so, what are the airlines? If not, which airlines differ from September to January?

A. Create a new function using pandas called num_flights_fun_all that runs the same query as above but returns the records for all months. The function should not take any arguments.

B. Use num_flights_fun_all to create a dataframe called months_df. The dataframe should have carrier, month, name, and num_flights columns and contain the number of flights that each airline made in each month. Show the first 15 lines of the dataframe to ensure the results are correct.

C. Use months_df to plot the number of flights each airline made for each month using altair, plotly, or seaborn.

D. Write a query that returns the each airlines' mean arrival delay across all of its flights. The result should have two columns: mean_delay and the full name of the airline. Which airline has the highest average delay?

E. Use pandas to query the data and return a dataframe to plot the data. Use either altair, seaborn, or plotly and plot the mean delay for each airline in a bar plot. The plot should have the airline name on the y and mean delay on the x and be ordered by mean delay. The bars should be colored by whether the airlines' flights arrive, on average, early or late. Which airline, on average, has the earliest arrivals?

Reference no: EM133776509

Questions Cloud

Identify barriers or challenges that you foresee for patient : Can care teams use the CCCR model? Identify barriers or challenges that you foresee for patients and care teams. How can barriers be overcome?
Discuss the impact of copays and deductibles on demand : Health care has both a consumption component and investment component? What is the investment component and can you provide an example?
What patterns of inequality can you identify from the slides : What patterns of inequality can you identify from these slides? Which groups seemed to be most affected by income and wealth gaps?
What are the unique characteristics of health care : HMGT 435 Healthcare Economics, University of Maryland Global Campus - Why do we care that health care spending is growing faster than growth in the overall
How many records are in each table in the database : How many records are in each table in the database? Did any flights depart and arrive exactly on time? What is the mean arrival delay among flights?
Positive working relationship with their employees : What is one way managers can ensure a positive working relationship with their employees?
Describe in initial post within professional environment : How might your classmate use a resume building skill they did not describe in their initial post within a professional environment?
How could social media and nontraditional media be used : How could social media and nontraditional media be used to supplement a traditional media campaign in this circumstance?
Discuss specific strategies : Discuss specific strategies or examples where this partnership approach can be implemented in your coaching or mentoring practices

Reviews

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd