What is the equation of that separates the two classes

Assignment Help Data Structure & Algorithms
Reference no: EM13973843

Consider the points/classes below and the perceptron algorithm taught in class.

Class 0:

x1 = (-2, 1)

x2 = (1,3)

Class 1:

x3 = (2,0)

X4 = (2,2)

(a) what is the equation of that separates the two classes

(b) graph this equation

Consider the same initial problem as described in (1) above BUT change the order the examples are presented to the perceptron to the follwoing order:
x1,x3,x2,x4

(a) what is the equation of that separates the two classes

(b) graph this equation

(c) give a single NEW point that causes the perceptron NOT TO CONVERGE and clearly show this point on the graph in (b) above

Consider the points/classes below and the perceptron algorithm taught in class.
Class 0:

x1 = (0 ,0,-4,1)

x2 = (2,3,-4,1)

x3 =(12,14,-4,1)

Class 1:

x4 = ( 0 ,5, 1, 2)

x5 = (2,3,5,1)

x6 = (12,14,5,1)

(a) what is the equation of that separates the two classes

4. Select the statement that best describes BAYESIAN reasoning that I emphasized in class: BAYESIAN REASONING

(a) works with 2 or more classes

(b) can accommodate missing or partial information

(c) runs much faster than a perceptron

(d) needs much fewer examples than the J48 algorithm

5. A person has swollen glands represented as symptom G. There are three possible diseases this person has - disease A, disease B , or disease C.

The relevant probabilities are given below:

p(A) = 0.35

p(B) = 0.35

p(C) = 0.30

the conditional probabilities are as follows:

P(G|A) = 0.015

P(G|B) = 0.010

p(G|C) = 0.020

What is the probability the patient has

(a) Disease A

(b) Disease B

6. Use the data in the exam folder entitled credit-g and Weka. Under the classifier, select BAYES and NAIVE BAYES with percentage split being 0.66 (i.e., 2/3). Run the naive bayes classifier:

(a) What is the recall?

(b) What is the precision?

(c) do the identical problem EXCEPT using the j.48 classifier with the same data and percentage split. What is the recall now?

(d) What is the precision?


The problems under 7 below are real. Don't expect all kinds of clean answers and obvious explanations. There may be missing data and techniques like ID3 may actually run out of attributes and NEVER create a single homogeneous classification. Be Careful. Here's how this will be graded: There is no "right" answer I am seeking. I don't have a 'key' for problem 1A and you either get my answer of you miss it. Rather, I am going to grade HOW you went about solving the problem and whether what you did is reasonable. If you blindly go applying any technique without any type of analysis - that's reckless in real life as well as the final. Instead, you should perhaps spend some time thinking about the strengths and weaknesses of each technique and the problem domain itself before rushing off to implement. For example, what is the cost of missed detections? What about false alarms? DO NOT USE ALL THE DATA TO TRAIN THE CLASSIFIER - instead use 2/3 to train and 1/3 for testing! Think in terms of training and test sets. DON'T just run your techniques on the entire dataset. Instead, when it makes sense to you, divide your dataset into two pieces (not necessarily equal) and SEE FOR YOURSELF HOW YOU'RE PERFORMING. If you train on one set of data and then test on another - and you correctly classify the test set -> that indicates a high confidence in the result. It's hard to criticize that type of method. Also think about the results? When are the results acceptable? When are the results possibly really misleading? If use use J48 - just submit your interpretation of the tree. We'll forgo implementation in CLIPS for another class. Otherwise - just use WEKA or your own perceptron code to create a classifier.

7. SELECT TWO PROBLEMS from the problems below (A,B,C, etc) to work on.

(A) Wine classification: use the WINE DATA in the link on the DATA FOLDER and attempt to classify wine origin based on its chemical properties.

(B) Here is a typical problem dealing with a real corporation and real people. The problem is credit approval. A Japanese company uses a complicated technique to approve credit and want to see if they can use some data mining technique to classify people as + (grant credit) or - (deny credit). So they turn over a portion of the database to you with all the attributes "coded" to protect anonymity. In other words, they may change something like the person's job title to "a", instead of "bank clerk". They're consistent in that all 'a's" are bank clerks but all that appears in the data for the 'occupation' attribute is "a", "b", etc. Worse, they won't even tell you WHAT the real attributes are. ALl you know is that + meant they approved, and - meant they denied. So your job is to see whether you can do to predict a + or - and how well you can do that. A solution to this problem is just going to be a technique and an estimate of how accurate it is. You won't be able to make much domain sense of your answer by saying something like "if occupation is bank clerk, then deny". SEE THE CREDIT SCREENING DATA in the data link given in the Data Folder under CREDIT APPROVAL.

(C) Assume a corporation needs to know the yearly income of a customer and does not want to annoy the customer by asking this question directly. Instead, the desire is to ask the consumer some "innocent" questions that can predict yearly income. The corporation gives you US Census data and asks you if this can be used to classify people as to whether they earn above or below $50,000 yearly. What are some questions that might be asked given this data? Data is found here under ADULT DATABASE in the data link in the DATA FOLDER: Assume your job is to come up with an automatic way to classify consumers if given the same data as appears in the data file. Justify and test your approach. What happened? How accurate were you?

(D) What can you tell me about the MUSHROOM DATA given in the data link in the DATA Folder. If you can build a classifier - do that. Do whatever you can with this data and tell me what you did and why. Justify your results and approach.

(E) Arrythmia is a heart ailment there is a database of people who either are normal or suffer from some type of arrhythmia given on the data link in the DATA FOLDER. Assume your job is to come up with an automatic way to classify patients if given the same data as appears in the data file. Justify and test your approach. What happened? How accurate were you?

(F) Select any data set from the provided data folder link and apply any technique you wish or a mix of techniques to see how they agree (e.g., J.48 and Bayes) Your choice - whatever interests you.

(G) Solve using any technique you wish but JUSTIFY your choice in a paragraph explaining why you selected that technique and how you tested the classifier!

Use the LUNG CANCER DATASET (Attribute 1 is the class variable). NOTE THAT THERE ARE MISSING VALUES IN THIS DATA!!!!

To receive credit you must present the classification results

Reference no: EM13973843

Questions Cloud

Problem regarding the declining-balance method : Distinguish between three depreciation methods: Straight-Line Method; Production Method; and Declining-Balance Method. Discuss which depreciation method would best reflect the risk of obsolescence from rapid technological changes.
What is advantage of cmos logic over pass transistor logic : What is the advantage of cmos logic over pass transistor logic and transmission gate logic. Why cant we use pass transistor logic and transmission gate logic in elmore delay model?
Net present value of the investment : What is the net present value of the investment, assuming the required rate of return is 24%? Would the company want to purchase the new machine?
Discuss how business will meet a significant marketplace : Discuss your assessment of whether or not your business service is likely to be successful.
What is the equation of that separates the two classes : What is the equation of that separates the two classes - What can you tell me about the MUSHROOM DATA given in the data link in the DATA Folder. If you can build a classifier - do that. Do whatever you can with this data and tell me what you did an..
John annualized holding period return : John purchased 100 shares of Black Forest Inc. stock at a price of $153.96 three months ago. He sold all stocks today for $157.67. During this period the stock paid dividends of $5.04 per share. What is John's annualized holding period return (ann..
Departmental impact on reimbursement : Departmental Impact on Reimbursement - Describe the impact of the departments at this healthcare organization that utilize reimbursement data. What type of audit would be necessary to determine whether the reimbursement impact is reached fully by t..
Objectivity regarding accounting-related matters : Is the internal auditors' objectivity regarding accounting-related matters impaired in each of the situations described below? Briefly explain your answer.
Integration of mnch commodities supply chain management : Objective of the study is to examine the outcome of integration of logistic and supply chain management of MNCH commodities on service delivery, specific objectives have been set to achieve the aim of the study.

Reviews

Write a Review

Data Structure & Algorithms Questions & Answers

  Create a flowchart that programs a robot to recognize

Create a flowchart that programs a robot to recognize how many playing cards you have and to put them in order from smallest to largest

  Find min returns the minimum key in the search tree

Find min returns the minimum key in the search tree, find min obj returns the object belonging to the minimum key,

  Write an algorithm that converts linear measurement in feet

Write an algorithm that converts a linear measurement in feet and inches into meters. One inch is equivalent to 2.54 centimeters.

  1 add 12ten to 15ten in binary and then subtract 12ten from

1. add 12ten to 15ten in binary and then subtract 12ten from 15ten in binary.2. using 4-bit numbers to save space

  Write algorithm using pseudo code consensus algorithm

Write an algorithm, using pseudo code, "Consensus algorithm": A group of ten people need to decide which one flavor of ice cream they will all order, out of three options.

  What would be the slowest time the algorithm can run

What would be the slowest time the algorithm can run (in terms of n). What input would cause this slowest time. What would be the fastest time your algorithm could run (in terms of n). For what input would this fastest time be achieved.

  Find the error in the code

The error Iam getting for the above code

  What is global or per process page replacement algorithms

What is better global or per process page replacement algorithms?

  Inventory tracking database

Construct a relational database of your choice. The DB should contain no more than six tables. Define three business requirements that this database will provide.

  Write routines to implement two stacks using only one array

Write routines to implement two stacks using only one array. Your stack routines should not declare an overflow unless every slot in the array is used.

  Insertion sort and merged using standard merging mechanism

Using "insertion sort" and then merged using standard merging mechanism, where k is value to be determined. How must be we select k in practice?

  Draw flowchart and execute the algorithm

Execute the algorithm below using 6 for "number" and Execute the algorithm below using 25 for "number" and draw a flowchart for the algorithm below and have it checked by the TA.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd