Apply decision tree classification technique

Assignment Help Data Structure & Algorithms
Reference no: EM132754106 , Length: word count:2500

Question 1: You are given a training dataset, "trainDataset.csv", and a testing dataset, "testDataset.csv", which will be provided in electronic form. The data are extracted and pre-processed from the original Titanic dataset. The attributes of each object (a passenger in this case) are defined as follows:
• Survived: represent whether the passenger survived (1) or not survived (0);
• PC (Passenger Class): the class of the passenger on ship;
• Sex: indicate the passenger's sex;
• Age: indicate the passenger's age group at the time of ship departure;
• SS (Sibling Spouse): indicate the number of Siblings/Spouses that the passenger has on the ship;

You are required to apply decision tree classification technique and the association rule evaluation to the above case appropriately. Specifically, you are required to:

1. Use the training dataset, apply the basic Hunt's Algorithm to train a fully-grown decision tree model, where the selection of attributes should follow the sequence: PC -> Age -> Sex -> SS. If the attribute has multiple attribute values, please use multiway split (do not use binary split). Leaf nodes should be declared as a single class label (do not use probability/fraction).

2. Use the training dataset, apply the Greedy strategy combined with the Gini impurity measure to rebuild a fully-grown decision tree. If the attribute has multiple attribute values, please use multiway split (do not use binary split). Leaf nodes should be declared as a single class label (do not use probability/fraction). Samples of the calculations and explanations should be provided to demonstrate the application process of the Greedy strategy and Gini impurity measure.

3. Use the test dataset to test two fully-grown decision tree models, and discuss the results.

4. Perform the post-pruning activities to two fully-grown decision trees by applying the following rules: (i) prune any sub-tree if its leaf nodes have the same class label, and (ii) prune any sub-tree if the number of objects (passengers) at each leaf node is not more than one. After pruning, please test two pruned decision trees using the test dataset. Discuss the results.

5. From two pruned decision trees, extract the association rules for each leaf node based on the information on the path from the root node to the leaf node in the decision trees. Evaluate the support, confidence, and lift of the identified association rules using the training dataset. Discuss the results.

Note:
As the majority of the tasks in this assignment is problem-solving based, the word count will be treated as flexible in the sense that if all the required tasks have been appropriately addressed, you will not be penalised for having a word count few than 3500 words. However, 3500 words should be treated as an upper limit.

Assessment Criteria:
The assessment criteria will generally follow the marking guidelines provided in the Management School Student Handbook. Specific assessment criteria are highlighted below:
1. Demonstrate understanding and knowledge of the relevant concepts, theories and techniques in data mining and machine learning;
2. Demonstrate ability to apply relevant techniques and tools of data mining and machine learning to solve the given problem and tasks;
3. Critically and analytically discuss results in a structured and logical manner;
4. Demonstrate ability to support your arguments with evidences and references;
5. Appropriate structure, presentation, use of English and use of the Harvard referencing style, e.g. figures and tables should be displayed legibly at the 100% zoom scale in a full-screen mode.

Attachment:- coursework.rar

Reference no: EM132754106

Questions Cloud

What are the three types of unemployment : What are the three types of unemployment? Unemployment is seen by some as undesirable. Are all three types of unemployment undesirable?
Hlss transportation logistics management : Describe the components that contribute to port security planning. Discuss the significance and purpose of the Secure Freight Initiative?
Explain planning for personal and family security : Review "Doing the Right Thing," in Chapter 5 of Managing the Public Sector. A partial list of large-scale governmental planning activities would have.
What is Lindsey net pay : Assuming the social security tax rate is 6% and Medicare tax is 1.5% of all earnings, what is Lindsey's net pay
Apply decision tree classification technique : Perform the post-pruning activities to two fully-grown decision trees by applying the following rules: (i) prune any sub-tree if its leaf nodes have the same
Describe the state you currently reside : Describe the state you currently reside (VIRGINIA) and describe some materials that are indicative of your state and their related information.
Identify a poorly designed object found : Identify a poorly designed object found in a public outdoor built environment and identify the aspects that contribute to the difficulty in understanding
Different theories of personality : Why do you think there are so many different theories of personality?
Enterprise resource planning : Does this limit innovation and if so, how can that risk be reduced? If you were an ERP vendor what would be your perspective?

Reviews

Write a Review

Data Structure & Algorithms Questions & Answers

  Understand the principles of software development

Understand the principles of software development - principles of microservices architecture and why certain decisions may be made in a certain way.

  Design a flowchart that is also a fully functional program

Using Visual Logic, design a flowchart that is also a fully functional program. According to your design, the program must: Continually accept data regarding the purchase of fruit until a sentinel value is entered.

  Developing a new application system

Assume you have been assigned as manager on a assignment to develop a new application system for your business partner. You were given 2-weeks to construct a project plan and high level cost estimates.

  Choose at least two operating system process-scheduling

write 400-600 words that respond to the following questions with your thoughts ideas and comments. this will be the

  Discuss the recursive algorithm

Consider the following recursive algorithm: min(A[0...n-1])input:an array A[0..n-1])if n=1 return A[0]else temp = MMin(A[0..n-2])if temp

  Write a driver program to test the look ahead stack class

Design Look Ahead Stack as a class template derived from a Stack class template.Write a driver program to test the Look Ahead Stack class .

  Identify the number of odd vertices

identify the number of odd vertices.

  Enter the last names of five candidates

Write a program that allows the user to enter the last names of five candidates in a local election and the votes received by each candidate. The program should then ouput each candidate's name, votes received by that candidate.

  Advantages of the binary search trees

Draw all binary search trees that can result from inserting permutations of 1, 2, 3, and 4. How many trees are there? What are the probabilities of each tree's?

  Binary search tree in ascending order by standard deviation

In this exercise you will create ten records, each containing the following fields: student name, class: Freshman, Sophomore, Junior or Senior, Major: Liberal Arts or General Science and ten randomly generated test scores between 0 and 100 for each s..

  Explain finding shortest path in graph

CSEB324 Data Structure and Algorithms Project. You are to prepare a presentation slides to explain finding shortest path in a graph using Dijkstra Algorithm

  Identify nodes that are cut-off

Use Alpha-Beta Search to compute the final value of the root node for the tree below. Use depth-first, left-to-right progression. Be sure to: identify nodes that are cut-off

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd