Apply decision tree classification technique

Assignment Help Data Structure & Algorithms
Reference no: EM132754106 , Length: word count:2500

Question 1: You are given a training dataset, "trainDataset.csv", and a testing dataset, "testDataset.csv", which will be provided in electronic form. The data are extracted and pre-processed from the original Titanic dataset. The attributes of each object (a passenger in this case) are defined as follows:
• Survived: represent whether the passenger survived (1) or not survived (0);
• PC (Passenger Class): the class of the passenger on ship;
• Sex: indicate the passenger's sex;
• Age: indicate the passenger's age group at the time of ship departure;
• SS (Sibling Spouse): indicate the number of Siblings/Spouses that the passenger has on the ship;

You are required to apply decision tree classification technique and the association rule evaluation to the above case appropriately. Specifically, you are required to:

1. Use the training dataset, apply the basic Hunt's Algorithm to train a fully-grown decision tree model, where the selection of attributes should follow the sequence: PC -> Age -> Sex -> SS. If the attribute has multiple attribute values, please use multiway split (do not use binary split). Leaf nodes should be declared as a single class label (do not use probability/fraction).

2. Use the training dataset, apply the Greedy strategy combined with the Gini impurity measure to rebuild a fully-grown decision tree. If the attribute has multiple attribute values, please use multiway split (do not use binary split). Leaf nodes should be declared as a single class label (do not use probability/fraction). Samples of the calculations and explanations should be provided to demonstrate the application process of the Greedy strategy and Gini impurity measure.

3. Use the test dataset to test two fully-grown decision tree models, and discuss the results.

4. Perform the post-pruning activities to two fully-grown decision trees by applying the following rules: (i) prune any sub-tree if its leaf nodes have the same class label, and (ii) prune any sub-tree if the number of objects (passengers) at each leaf node is not more than one. After pruning, please test two pruned decision trees using the test dataset. Discuss the results.

5. From two pruned decision trees, extract the association rules for each leaf node based on the information on the path from the root node to the leaf node in the decision trees. Evaluate the support, confidence, and lift of the identified association rules using the training dataset. Discuss the results.

As the majority of the tasks in this assignment is problem-solving based, the word count will be treated as flexible in the sense that if all the required tasks have been appropriately addressed, you will not be penalised for having a word count few than 3500 words. However, 3500 words should be treated as an upper limit.

Assessment Criteria:
The assessment criteria will generally follow the marking guidelines provided in the Management School Student Handbook. Specific assessment criteria are highlighted below:
1. Demonstrate understanding and knowledge of the relevant concepts, theories and techniques in data mining and machine learning;
2. Demonstrate ability to apply relevant techniques and tools of data mining and machine learning to solve the given problem and tasks;
3. Critically and analytically discuss results in a structured and logical manner;
4. Demonstrate ability to support your arguments with evidences and references;
5. Appropriate structure, presentation, use of English and use of the Harvard referencing style, e.g. figures and tables should be displayed legibly at the 100% zoom scale in a full-screen mode.

Attachment:- coursework.rar

Reference no: EM132754106

Questions Cloud

What are the three types of unemployment : What are the three types of unemployment? Unemployment is seen by some as undesirable. Are all three types of unemployment undesirable?
Hlss transportation logistics management : Describe the components that contribute to port security planning. Discuss the significance and purpose of the Secure Freight Initiative?
Explain planning for personal and family security : Review "Doing the Right Thing," in Chapter 5 of Managing the Public Sector. A partial list of large-scale governmental planning activities would have.
What is Lindsey net pay : Assuming the social security tax rate is 6% and Medicare tax is 1.5% of all earnings, what is Lindsey's net pay
Apply decision tree classification technique : Perform the post-pruning activities to two fully-grown decision trees by applying the following rules: (i) prune any sub-tree if its leaf nodes have the same
Describe the state you currently reside : Describe the state you currently reside (VIRGINIA) and describe some materials that are indicative of your state and their related information.
Identify a poorly designed object found : Identify a poorly designed object found in a public outdoor built environment and identify the aspects that contribute to the difficulty in understanding
Different theories of personality : Why do you think there are so many different theories of personality?
Enterprise resource planning : Does this limit innovation and if so, how can that risk be reduced? If you were an ERP vendor what would be your perspective?


Write a Review

Data Structure & Algorithms Questions & Answers

  Implement an open hash table

In this programming assignment you will implement an open hash table and compare the performance of four hash functions using various prime table sizes.

  Use a search tree to find the solution

Explain how will use a search tree to find the solution.

  How to access virtualised applications through unicore

How to access virtualised applications through UNICORE

  Recursive tree algorithms

Write a recursive function to determine if a binary tree is a binary search tree.

  Determine the mean salary as well as the number of salaries

Determine the mean salary as well as the number of salaries.

  Currency conversion development

Currency Conversion Development

  Cloud computing assignment

WSDL service that receives a request for a stock market quote and returns the quote

  Design a gui and implement tic tac toe game in java

Design a GUI and implement Tic Tac Toe game in java

  Recursive implementation of euclids algorithm

Write a recursive implementation of Euclid's algorithm for finding the greatest common divisor (GCD) of two integers

  Data structures for a single algorithm

Data structures for a single algorithm

  Write the selection sort algorithm

Write the selection sort algorithm

  Design of sample and hold amplifiers for 100 msps by using n

The report is divided into four main parts. The introduction about sample, hold amplifier and design, bootstrap switch design followed by simulation results.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd