Decision tree learning for cancer diagnosis, Computer Engineering

Assignment Help:

Assignment 1: Decision tree learning for cancer diagnosis

In this mini-project, you will implement a decision-tree algorithm and apply it to breast cancer diagnosis. For each patient, an image of a fine needle aspirate (FNA) of a breast mass was taken, and nine features in the image potentially correlated with breast cancer were extracted. Your task is to develop a decision tree algorithm, learn from data, and predict for new patients whether they have breast cancer. Dataset can be downloaded from U.C. Irvine Machine Learning Repository.

1.       Collect the data set from my website. Each patient is represented by one line, with columns separated by commas: the first one is the identifier number, the last is the class (benign or malignant), the rest are attribute values, which are integers ranging from 1 to 10. The attributes are (in case you are curious): Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli, Mitoses. (Note that the UCI document page specifies a different number of attributes, because it refers to a set of several related datasets. For detailed information of the dataset that we use here, see this document.)

2.       Implement the ID3 decision tree learner, as described in Chapter 3 of Mitchell. You may program in C/C++, Java. Your program should assume input in the above format.

3.       Implement both misclassification impurity and information gain for evaluation criterion. Also, implement split stopping using chi-square test.

4.       Divide the data set randomly between training (80%) and testing (20%) sets. Use your algorithm to train a decision tree classifier and report accuracy on test. Run the same experiment 100 times. Then calculate average test performances (accuracy, precision, recall, f-measure, g-mean).

5.       Compare performances by varying the evaluation criteria. Make a table as follows:

Evaluation Criteria

Accuracy

Precision

Recall

F-measure

G-mean

misclassification impurity

 

 

 

 

 

information gain

 

 

 

 

 

6.       Answer the following:

a.       Which evaluation criterion and confidence level work well? Why?

b.       Do you see evidence of overfitting in some experiments? Explain.

 


Related Discussions:- Decision tree learning for cancer diagnosis

What is a reference string, What is a reference string? An algorithm is...

What is a reference string? An algorithm is evaluated by running it on a exacting string of memory references and computing the number of page faults. The string of memory refe

Determine about the memory stack, Memory Stack Stack could exist as a s...

Memory Stack Stack could exist as a stand-alone unit or could be executed in a random-access memory attached to the CPU. The implementation of a stack in a CPU is done by assig

Find the shortest path, The following is the required interface for the mou...

The following is the required interface for the mouse and cheese problem. Your program is required to read its input from a file named 'maze.txt' In the maze.txt

Determine the term- security, Determine the term- Security When using ...

Determine the term- Security When using Internet, security can be enhanced using encryption. Debit and credit card transactions can also be protected by a specific type of pas

Adder substractor, how can we bimpliment half substractor using nand gate

how can we bimpliment half substractor using nand gate

What are the measures to be taken in the design, What are the measures or p...

What are the measures or precautions to be taken in the Design when the chip has both analog and digital portions? As today's IC has analog components also inbuilt, some design

Define micro operation, Define Micro operation. The operations implemen...

Define Micro operation. The operations implemented on data stored in the registers are called Micro operation. A microperation is an elementary operation performed on the infor

Logic diagrams for same boolean expression, Q. Logic diagrams for same Bool...

Q. Logic diagrams for same Boolean expression? The expression F can be simplified using Boolean algebra. The logic diagram of simplified expression is drawn in fig (b)

What is an interrupt, What is an interrupt?  An interrupt is an event t...

What is an interrupt?  An interrupt is an event that causes the implementation of one program to be suspended and another program to be implemented.

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd