Describing how you designed your network

Assignment Help Computer Engineering
Reference no: EM131451328

Program  - Neural Networks

For the last programming project of the semester, you will use off-the-shelf neural network software to investigate airline lateness statistics. You are given a data file (.csv format) containing data on delays from various causes from the 29 largest airports in the United States. We are interested in finding out if the pattern of causes of delays is sufficient to identify the airport.

THE SOFTWARE
The WEKA package is available on all Flarsheim labs (and is also a free download if you want to install it on your own system). It has modules to compute many types of AI functions including Bayesian networks and neural networks, the focus of this assignment. WEKA allows you to build and train a network by specifying the configuration and the data file; there is no need to write your own artificial-neuron or backpropagation code.

THE DATA FILE
The file, airlines.csv, is taken from https://think.cs.vt.edu/corgis/csv/airlines/airlines.html. The data dictionary, describing the format and meaning of each field, is also on that page. In short, the file contains, for each airport, the number of delays:
- due to the airline;
- due to late aircraft;
- due to issues with the aviation system itself (congestion, air traffic control, etc)
- due to security concerns; and
- due to weather.
In addition, it lists the number of flights canceled, delayed, or diverted. For delays, it lists the total minutes delay for each cause.

The file also contains the number of carriers, total number of flights, and number of on-time flights per airport. The 3-letter airport code is also given; this is the output (dependent) variable.

The data on number of carriers per airport should be screened out of your input data and not used as input for your network. This is because in some cases this is enough to uniquely identify the airport; we do not want our network to bypass the bulk of the data. Likewise, the name of the airport should not be used as input.

INPUT TO YOUR NETWORK
Use the numeric data for number and amount of delays, diversions, etc., for each cause. You will want to normalize this data, either by number of delays or number of flights.

Scaling the data: You may need to adjust the scale of your data (e.g. record delays in hours rather than minutes) so that all inputs are of approximately the same magnitude. If inputs vary over multiple scales of magnitude (as this data does), the network requires much more training-and we only have so much data. Therefore, adjusting data so that numbers are proportions (floats in [0.0 - 1.0)) rather than raw counts can provide more efficient learning from the same data. Another option is to code each variable separately as a z-score, as the number of standard deviations above or below the mean that item is. (z- scores below the mean are negative, above the mean positive; thus a z-score of -0.27 means an item is 0.27 standard deviations below the average for that variable, and a z-score of 1.12 is 1.12 standard deviations above the mean). The advantage of this is that all data items are on the same scale-mean of 0, standard deviation of 1-even if some variables have a characteristic range of 0.01-0.10 and others have a range of 1,000 - 100,000.
Exclude from input: Name of airport, month, year, month name, year/month code, airport code.

OUTPUT FROM YOUR NETWORK:

Your network should have 29 output neurons, 1 for each airport. Select the maximum value from the output neurons as the network's response.

NEURAL NETWORK CONFIGURATION

This is your playground! The general approach is to start with the input neurons and a single neuron in the hidden layer. Randomly select a subset of the data (say, 5% of it) to withhold for testing (WEKA can do this automatically) and train the network, then test it on the withheld data. At first, it'll probably be terrible. Then add a second neuron to the hidden layer, select a new subset of the data, retrain the network from scratch, and check results. Continue adding neurons to the hidden layer until the network can consistently predict all withheld data, or when adding more neurons leads to a decrease in performance on the test set.

That uses one hidden layer. Each hidden neuron computes a linear combination of the inputs, and each output is a linear combination of the hidden neurons. You can have multiple layers of hidden neurons. It's probably best not to get too carried away; this data probably won't support more than 2 hidden layers. (The more hidden layers, the more training data needed.) And there's no requirement of multiple hidden layers; in general, a network should be as complex as needed to perform well, and no more.

So try some different configurations if you like. The key point is that anytime the network configuration is changed, the entire network must be re-initialized and trained from the beginning, particularly if different data items are selected for testing. (Otherwise the network is partly trained on test data, which invalidates any test results.)

Reference no: EM131451328

Questions Cloud

What is the aftertax cost of debt : Jiminy's Cricket Farm issued a 30-year, 6.5 percent semiannual bond 7 years ago. What is the aftertax cost of debt?
What rights do you consider to be universal human rights : Does it surprise you that the UN considers copyright protection to be a "universal human right" and that plagiarism would be a human rights violation? Why or why not?
What is most important to you as a person : Once you have completed this exercise, what surprised you and didn't surprise you about your core values? What is most important to you as a person?
Explain the behavior of the board more or less ethical : HP Chairwoman Patricia Dunn defended the actions of the board by arguing that HP's higher standards of corporate integrity justified.
Describing how you designed your network : CS 461 It has modules to compute many types of AI functions including Bayesian networks and neural networks, the focus of this assignment.
What actions could socgen have taken to prevent large losses : HealthSouth is America's largest provider of outpatient surgery and rehabilitation services. It owns or operates over 1,800 facilities across the country.
Suggest a key insight about the financial health of company : Select one publically traded organizations. Based on your review of the financial statements, suggest a key insight about the financial health of the company.
Describe strategies employed by the city of your choice : Choose a city project listed on the "Smart Growth in Action" website. Describe strategies employed by the city of your choice to promote a "livable place."
Why is there heavier activity in certain areas of the world : Why is there heavier activity in certain areas of the world?What do the sites of terrorist activities have in common?

Reviews

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd