Define a set of language primitives to be used by gp

Assignment Help Computer Engineering
Reference no: EM131429449

Hand in: A printed report about your experiments. Electronic copies of your report, source and parameter files (online hand-in). Use Excel to create performance graphs, to be included in your report. Submit electronic copies of all data and report files using the "submit5p71" script on sandcastle. It will submit the entire directory substructure of your assignment.

System: You are free to use any GP system you want (ECJ, lilGP, Open Beagle, RobGP, etc.).The system that will be discussed in class is lilGP 1.1. This is a C-based GP system with the same functionality as Koza's GP system in his first and second books. It also has many additional enhancements (strong typing, multiple populations, etc.). However, ECJ has been the most popular choice for students, as it is more contemporary, better maintained, has more features, and is documented. See the end of the assignment for the location of these systems.

A. Symbolic regression

This question will introduce you to genetic programming. You are to take an existing symbolic regression example, compile it for the platform you are using, and execute it to get some results. The lilGP, ECJ, and Open Beagle systems all include this example. Once you get it running, modify the training initialization setup, by reading in the training points from a text file. You should have the total number of points to process and the name of the text file as parameters in the "input file" user parameter file. This will let you run your system on new data sets, without having to recompile the GP system. (Note that this modification will be useful for part B below).

Try the system a few times, on new data you have put into a new training file. Also try it on a few variations of GP languages. For example, try it on the default language, as well as a stronger language (add new functions).

Decide on two comparative experiments to do, using the same training data. You might compare the GP language variants, or selection strategy, or fitness evaluation scoring. Do 10 runs using each of these experimental variations. Your report should compare these two experiments. A summary table consisting of the average over 10 runs of all the population averages, and average "best" at the end of each run, should be computed for each experiment, and put into a table. Performance graphs showing the overall run averages of the population average and best per generations should also be plotted in Excel.

B. Auto MPG Data Set

The Auto MPG data set is a famous classification problem from the UCI Machine Learning Repository:
https://archive.ics.uci.edu/ml/datasets/Auto+MPG

You are to use genetic programming to evolve a rule that predicts the fuel consumption (mpg, miles per gallon) for a given var. The data set has 398 cases, all with the following numeric attributes:

1. mpg: continuous
2. cylinders: multi-valued discrete
3. displacement: continuous
4. horsepower: continuous
5. weight: continuous
6. acceleration: continuous
7. model year: multi-valued discrete
8. origin: multi-valued discrete
9. car name: string (unique for each instance)

Here are three examples from the database, for three different cars:

25.0 4 113.0 95.00 2228. 14.0 71 3 "toyota corona"
25.0 4 98.00 ? 2046. 19.0 71 1 "ford pinto"
19.0 6 232.0 100.0 2634. 13.0 71 1 "amc gremlin"

Attributes 1, 3, 4, 5, and 6 are continuous (floating point); attributes 2, 7, 8 are discrete (integers). Attribute 9 is a string of the car name. You should ignore this value, as it will not be useful for GP to use. (You can load the data into Excel, and delete that column). Once this is done, then all values in the data are numeric (with an exception mentioned below). Since all the values are numeric, it is not necessary to use strongly typed GP. (However, strong typing can still be useful.)

Attribute #1 is the MPG for the car. It is the "answer value" for a car example. We want to evolve a GP program will evolve an expression that uses attributes 2 through 8, to determine attribute 1. For example, if we supply an evolved GP expression with the attribute values for the Toyota Corona, we want the GP program to return a value close to 25.0. Of course, do not give your GP tree expression access to the MPG value, because it is the solution (that would be like giving students an exam key during an exam!).

The Ford Pinto example has a missing horsepower ("?"). There are 6 examples in the database that have missing values like this. Please delete these examples the database.

Here is a recommended approach to follow.

1. Read the database into a large table, one row per example. You should randomly shuffle the rows. Then split the table into 2 independent sets of examples - a training set, and a testing set. The exact size of these sets is up to you, and may have to be experimentally determined. Be aware that machine learning is usually affected by the size of training sets. For example, sometimes smaller training sets are more effective.

2. Define a set of language primitives to be used by GP. These primitives should work sensibly on the input data. A possible set of primitives to consider are:
a. functions: arithmetic operators, min and max, relationals, if-then-else,...
b. terminals: attribute variables (7), and ephemeral random constants.

3. The top-level wrapper that executes each classifier program works as follows. The 7 attribute values, from a row in the table, are copied into a set of 7 variables accessible by the GP expression, and which will take the form of leaf terminals within the tree. The program is executed by the GP system interpreter using these values. The resulting answer is recorded. The fitness function will then compare this answer to the known quality solution from the example table, and the fitness score will be appropriately updated (see next point). So in other words, the following pseudo-code shows the general execution strategy:

float cylinders, displacement, horsepower, ..., origin; int ans;

loop for i=0 to N-1 : // N training examples from table cylinders = training[i][1];
displacement = training[i][2]; horsepower = training[i][3];
(... etc. all the way to training[i][7])
computed_mpg = execute_tree(t); // t is the current GP tree
// now the predicted quality found in ans can be compared to
// real recorded quality of that example,
// perhaps stored in integer array ans[i] (i.e. training[i][0]).

4. Fitness is based upon how well the GP answers match the "real" mpg for cars as recorded in the database. There are a few options.
(i) Compute a sum of absolute error between the actual mpg and target mpg for a car:
∑ abs( computed_mpg - ans[i] ) This summation is computed over all the training examples.
(ii) Compute the sum of errors squared:
∑ (computed_mpg - ans[i]) ^ 2 This tends to punish large differences in predicted mpg.

Note that both these formula prefer smaller sums (an exact match would have a sum of 0.0).

5. At the end of the run, you must test the best GP solution by running it on the testing set (i.e. all the examples not used for training). This gives an idea of the generality of your evolved solution when given new data it has never seen. Record the test performance of each run. This information will be included in the report.

6. Do 10 runs, each with different random number seeds and shuffled data sets. Collate the performance of your runs together. Your report should have a table summarizing all the runs, for both the training scores and testing scores. You should report the average best solution scores, as well as the scores for the overall best solution from all runs. Include a confusion matrix with each experiment. Also include performance

graphs showing the population fitness and best fitness plotted over generations (averaged over for all the runs).

7. Your report should describe all relevant aspects of your experiment. The goal of the report is to give enough information so that someone else could reproduce your experiments. Describe your problem set (and its source), and describe what your GP program is attempting to do. Be sure to describe your GP language, all GP parameters, and your fitness evaluation methodology. Include source code of evolved GP programs for your 2 best rules obtained. If they are large, they can be put into an appendix of your report. Include a discussion on your results. Tables and graphs do not speak for themselves, and so you must describe and discuss the important features in them. Also include a conclusion section that summarizes the strengths or weaknesses of your experiment.

The report format should be:

1. Introduction: problem description
2. Experiment details:
a. parameters for GP
b. fitness function
c. GP language
d. format of data
e. variations of experiments
3. Results
a. tables of results (best, avg, etc.), confusion matrices
b. graphs
4. Analytical discussion of results
5. Conclusions
6. Bibliography (data set URL, references,...)

Please write separate reports for parts A Regression and B Auto MPG.

https://drive.google.com/file/d/0B3-8Q4yDXVU8SkNDRW4xT0gzWUk/view

Attachment:- Genetic programming.zip

Attachment:- Tutorial.txt

Reference no: EM131429449

Questions Cloud

Why are ethical behavior-social responsibility : Why are ethical behavior, social responsibility, and sustainability an important focus of corporate governance? Why do you believe some managers embrace this while others resist? What is the virtuous circle of corporate social responsibility? Identif..
Develop a coherently structured paper with an introduction : Revise, using feedback from the professor and classmates, your Persuasive Paper Part I: A Problem Exists. Develop Part 2: Solution to Problem and Advantages . Include a defensible, relevant thesis statement clearly in the first paragraph. (The thes..
What theory of insider trading should the prosecution : Business Ethics R. Foster Winans, a reporter for the Wall Street Journal, was one of the writers of the “Heard on the Street” column, a widely read and influential column in the Journal. This is one of the more famous insider trading cases of recent ..
New neighborhood completing long-term school assignment : Consider a project, such as moving to a new neighborhood completing a long-term school assignment, or even cleaning your bedroom. Develop a set of activities necessary to accomplish that project and then order them in a precedence manner to create se..
Define a set of language primitives to be used by gp : COSC 5P71 GP: Assignment - Define a set of language primitives to be used by GP. These primitives should work sensibly on the input data and you should randomly shuffle the rows. Then split the table into 2 independent sets of examples - a training ..
Design a set of lending or borrowing positions : Design a set of lending/borrowing positions in the interbank money market that will generate a risk-free one-year-ahead forward rate quote stated in term of USDs per CLP. Explain why this is a risk-free quote
What is the time estimate of an activity : What is the time estimate of an activity in which the optimistic estimate is 2 days, pessimistic is 12 days, and most likely is 4 days? Show your work. What is the time estimate of an activity in which the optimistic time is 5 days the likely time is..
Project schedule changes : You have kicked off the District 4 Production Warehouse Move project, your contractors are in place and working on receiving the proper building permits. Build the extra time into your schedule by doubling the installation work timelines for both the..
Top-down budgeting and bottom-up budgeting concepts : What is the difference between top-down budgeting and bottom-up budgeting concepts? When would you use one concept versus another? Based on your experience, which one is more effective and why? What is meant by crashing a project and what effect does..

Reviews

len1429449

3/16/2017 6:02:25 AM

"If anything is not clear, email me back ASAP Please" I have sent you the assignment, the marking scheme, and a ZIP File with the pictures to choose from and three example reports to take a look at. The program to work on is ECLIPSE ECJ System.. I attached the an important example report to look at to this email, and assignment 1 as there are some content that might be needed in this assignment. No there is no word limit, just make sure to follow the marking scheme But I think 8 or 9 pages would be enough.

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd