Draw scatter plots for the given data with initial centroids

Assignment Help Python Programming
Reference no: EM132892094

Programming Assignment

1. Naïve Bayes

(1) Complete uploaded python code.
First, you have tobinarize training set (trainX) of MAGIC Gamma Telescope data set. Each column is converted to binary variable based on the average value. If a value is greater than average, set a value as 1. Otherwise, set a value as 0. Then, using new binarized dataset, calculate p_ij=P(x_j=1|y_j=i)(i=class,j=feature).

 

                   Class g Class h
P(x_1=1)
P(x_2=1)
P(x_3=1)
P(x_4=1)
P(x_5=1)
P(x_6=1)
P(x_7=1)
P(x_8=1)
P(x_9=1)
P(x_10=1)

(3) Based on the calculated p_ij, calculate probability of class g for each test sample (testX) and calculate accuracy for testX with varying cutoff (To binarize testX, use the mean of trainX).Prior probabilities of classes are proportional to ratios of classes in training set. cutoff ∈{0.1,0.15,0.2,0.25,...,0.95}. Draw a line plot (x=cutoff, y=accuracy).

(4) Explain why the shape of figure of Question 1-(3) looks like this.

2. Decision Tree
The aim of the given data set is to predict annual income of people based on the following factors.
age: the age of an individual
capital-gain:capital gains for an individual
capital-loss: capital loss for an individual
hours-per-week: the hours an individual has reported to work per week
sex: 1 if male, 0 if female
native-country: 1 if USA, 0 if others
workclass_[#]: 1 if an individual belongs to workclass # otherwise 0 (eg.Workclass_Private is 1 if an individual works for private companies)
education_[#]: 1 if an individual's education level is # otherwise 0(education level:Graduate>4-year university> "<4-year university" > High school > "<High school" > Preschool)
marital-status_[#] 1 if an individual's marital status is # otherwise 0 (Married-civ-spousecorresponds to a civilian spouse while Married-AF-spouse is a spouse in the Armed Forces)
occupation_[#]: 1 if an individual's occupation is # otherwise 0.
race_[#]: 1 if an individual's race is #, otherwise 0

Target is ‘income' (">50K" or "<=50K")
fnlwgt represents the number of people the census believes the entry represents, which is not used in training.
(1) Train a decision tree with the setting that max_depth=3, min_samples_split=100, min_samples_leaf=50 using entropy. Then, calculate overall accuracy, accuracy of class ">50K", and accuracy of class "<=50K".
overall accuracy accuracy of class ">50K" accuracy of class "<=50K"

(2) Based on the answer of Question 2-(1), describe the limitations of the trained decision tree model.

(3) Draw the trained tree.

(4) Explain the rule for class ">50K" that contains the most cases.

(5) Explain the rule for class "<=50K" that contains the most cases with an accuracy of 0.7 or higher.

(6) Train a new tree by changing a metric for finding split rules from entropy to gini impurity and compare two models in terms of the performance of the models and the generated rules.

3. k-means clustering
This problem uses the data generated from 4 normal distributions for applying k-means clustering.

k-means implemented in sci-kit learn can assign initial centroids through ‘init'. When init is set as cby parray (c= the number of clusters, p= the number of features), each row is used as a centroid.

(1) Select randomly 4 samples from the given data set and use them as initial centroids. This procedure is repeated for 100 times. Then, calculate the average values of the silhouette coefficient and adjusted rand index values for 100 iteration.

silhouette coefficient adjusted rand index

(2) Select randomly one sample from each normal distribution and use them as initial centroids. This procedure is repeated for 100 times. Then, calculate the average values of the silhouette coefficient and adjusted rand index values for 100 iteration. (5pts)
silhouette coefficient adjusted rand index

(3) Draw scatter plotsfor the given data with initial centroids and final centroids for the worst cases among 100 trials in Question 3-(1) in terms of silhouette coefficient and adjusted rand index, respectively. The initial centroids should be marked as red ‘X' and the final centroids should be marked as blue ‘X'.

(4) Draw scatter plots for the worst case of Question 3-(2)in the same way as in Question 3-(3).

(5) Based on the different results from 100 trials for each case, compare two different methods to determine initial centroids.

Attachment:- Programming Assignment.rar

Reference no: EM132892094

Questions Cloud

Prepare the investing activities section of Tifton statement : Proceeds from issuance of common stock $400,000. Prepare the investing activities section of Tifton's statement of cash flows
Essential ingredients of symmetric cipher : What are the essential ingredients of a symmetric cipher? What are the two basic functions used in encryption algorithms?
Name the law and section of the law : Name the law and section of the law that principally deals with real estate agents gaining beneficial interest in South Australia. What does that law require an
Distinguish between the mega- and task-environments : 1. How could you distinguish between the mega- and task-environments in terms of organizational control and workplace diversity?
Draw scatter plots for the given data with initial centroids : Draw scatter plots for the given data with initial centroids and final centroids for the worst cases among 100 trials in Question 3-(1) in terms of silhouette
Prepare the operating activities section : Harrisburg Corporation had net income of $35,000, a $9,000 decrease in accounts receivable, prepare the operating activities section
Describe which sector would you focus on : If you are opening a business, describe which sector would you focus on. Explain why
Principles and techniques of goal setting : You have been asked to give a brief presentation to your team about the principles and techniques of goal setting, measuring performance, time management and pe
What is the net increase in cash for the year ended December : What is the net increase in cash for the year ended December 31, 2019, as a result of the preceding information

Reviews

len2892094

5/20/2021 1:17:53 AM

Python programming assignment with tasks that include naive bayes, k-nearest, and decision tree algorithms

Write a Review

Python Programming Questions & Answers

  Write a python program to implement the diff command

Without using the system() function to call any bash commands, write a python program that will implement a simple version of the diff command.

  Write a program for checking a circle

Write a program for checking a circle program must either print "is a circle: YES" or "is a circle: NO", appropriately.

  Prepare a python program

Prepare a Python program which evaluates how many stuck numbers there are in a range of integers. The range will be input as two command-line arguments.

  Python atm program to enter account number

Write a simple Python ATM program. Ask user to enter their account number, and print their initail balance. (Just make one up). Ask them if they wish to make deposit or withdrawal.

  Python function to calculate two roots

Write a Python function main() to calculate two roots. You must input a,b and c from keyboard, and then print two roots. Suppose the discriminant D= b2-4ac is positive.

  Design program that asks user to enter amount in python

IN Python Design a program that asks the user to enter the amount that he or she has budget in a month. A loop should then prompt the user to enter his or her expenses for the month.

  Write python program which imports three dictionaries

Write a Python program called hours.py which imports three dictionaries, and uses the data in them to calculate how many hours each person has spent in the lab.

  Write python program to create factors of numbers

Write down a python program which takes two numbers and creates the factors of both numbers and displays the greatest common factor.

  Email spam filter

Analyze the emails and predict whether the mail is a spam or not a spam - Create a training file and copy the text of several mails and spams in to it And create a test set identical to the training set but with different examples.

  Improve the readability and structural design of the code

Improve the readability and structural design of the code by improving the function names, variables, and loops, as well as whitespace. Move functions close to related functions or blocks of code related to your organised code.

  Create a simple and responsive gui

Please use primarily PHP or Python to solve the exercise and create a simple and responsive GUI, using HTML, CSS and JavaScript.Do not use a database.

  The program is to print the time

The program is to print the time in seconds that the iterative version takes, the time in seconds that the recursive version takes, and the difference between the times.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd