Calculate the normal vector of the hyperplane

Assignment Help Engineering Mathematics
Reference no: EM13803180

Question 1: The ArnetMiner citation dataset (provided by arnetminer.org) by year 2012.

(1) Count the number of authors, venues (conferences/journals), and publications in the datasets.

(2) What are the min, max, Q1, Q3, and median number of publications per author? Can you plot the histogram for number of publications per author?

 (3) What are the min, max, Q1, Q3, and median number of citations per author? Can you plot the histogram for number of citations received per author?

(4) Please plot the scatter plot between the numbers of publications vs. the number of citations for authors who have more than 5 publications.

Question 2: Decision Tree

Construct a decision tree for the following training data, where "Edible" is the class we are going to predict. Information gain is used to select the attributes. Please write down the major steps in the construction process (you need to show the information gain for each candidate attribute when a new node is created in the tree).

1312_img1.png

Question 3: Naïve Bayes

Consider a Naïve Bayes model for spam classification with the vocabulary V = {secret, offer, low, price, valued, customer, today, dollar, million, sports, is, for, play, healthy, pizza}, where each word in the vocabulary is considered as a feature, and their values could be either 1 or 0, denoting whether they exist in one message. We have the messages and labels in the following table:

Messages

Class label

Million dollar offer

Spam

Secret offer today

Spam

Secret is secret

Spam

Low price for valued customer

non-spam

Play secret sports today

non-spam

Sports is healthy

non-spam

Low price pizza

non-spam

Question 4: Support Vector Machine

#

X1

x2

class

1

2.46

2.59

1

2

3.05

2.87

1

3

1.12

1.64

1

4

0.01

1.44

1

5

2.2

3.04

1

6

0.41

2.04

1

7

0.53

0.77

1

8

1.89

2.64

1

9

-0.39

0.96

1

10

-0.96

0.08

1

11

2.65

-1.33

-1

12

1.57

-1.7

-1

13

3.05

0.01

-1

14

2.66

-1.15

-1

15

4.51

-0.52

-1

16

3.06

-0.82

-1

17

3.16

-0.56

-1

18

2.05

-0.62

-1

19

0.71

-2.47

-1

20

1.63

-0.91

-1

Given 20 data points and their class labels in the above, suppose by solving the dual form of the quadratic programming of svm, we can derive the α′s for each data point as follows:

α7 = 0.4952

α18 = 0.0459

α20 = 0.4493

Others = 0

(1) Please point out the support vectors in the training points.

(2) Calculate the normal vector of the hyperplane: w

(3) Calculate the bias b, according to b = ∑k:αk≠0(yk - w′xk)/Nk , where xk = (xk1, xk2)′ indicate the support vectors and Nk is the total number of support vectors.

(4) Write down the learned decision boundary function f(x) = w′x + b (the hyperplane) by substituting w and b with learned values in the formula.

(5) Suppose there is a new data point x = (-1,2), please use the decision boundary to predict its class label.

Question 5: Mutual Information and Information Gain

In information theory, mutual information between two discrete random variables is defined as:

 ??(??; ??) = ∑xy??(??, ??)log( ??(??, ??) /??(??)??(??))

Which is designed for evaluating the mutual dependence of two random variables, what is the connection between mutual information and information gain we have learned in decision tree? Can you prove it? (Hint: consider Y as the class label, and X as the attribute to predict Y.)

Reference no: EM13803180

Questions Cloud

Social stratification and how it influences : Social Stratification and How it Influences
Write a paper about fredrick taylor : Write a paper about Fredrick Taylor.
Competing theories of the origin of life on earth : A Summary of the Three (3) Current Competing Theories of the Origin of Life on Earth: It Arrived from an Extraterrestrial Source; It Originated as a Heterotroph; It Originated as an Autotroph.
Write a paper on scientist eo wilson on why umanities matter : Write a response paper about the essay "Scientist E.O. Wilson On Why The umanities Matter". What is the writer's point? Do you agree or disagree with him?
Calculate the normal vector of the hyperplane : Please point out the support vectors in the training points. Calculate the normal vector of the hyperplane
Assignment on ethical systems : Ethical Systems
Write a research paper about the birthmark : Write a research paper about The Birthmark.
What is bobs velocity at highest point in its trajectory : What is Bob's velocity at the highest point in its trajectory - how long does it take to get to this point - How far does Bob travel horizontally right before he hits the ground?
Define group communication in three to five sentences : Define group communication in three to five sentences. How does group communication differ from individual communication? Discuss strategies to promote individual and group communication?

Reviews

Write a Review

Engineering Mathematics Questions & Answers

  Standard deviation for the average time issues

What is the probability that a randomly selected fertilized chicken egg hatches in less than 20 days?

  Analysis of the regressions

Analysis of the Regressions - Make very specific comments and give reasons regarding any similarities or differences in the output results and which regression produces the strongest correlation coefficient result?

  Prime number theorem

Dirichlet series

  Verify demorgan''s law

Verify DeMorgan's Law, as presented in Section 1.1.2. Using DeMorgan's Law, give equivalent functions for the following: wx + xz + y'

  Wirte a correct alternative hypothesis

What differentiates a Z test statistic for a population from the z statistic for sampling of the mean? Why difference. Consider a normal population

  Reduce the matrix r to its echlon form

Write the elements of matrix R in terms of real-numbers r1, r2,..............rN. Clearly, show at least the top 4 x 4 part and all the elements on the four corners.

  What is the profit maximizing level of output

Happy Planet drinks price change as the quantity sold changes. In particular p=10-0.005x. The total cost to produce the drinks are $2.50 per drink. Their production factory costs $1000 per month.

  Evaluate the function using cauchys integral formula

Find paremetric representations and sketch the path and what it the shape of the given contour - evaluate the function using Cauchy's integral formula and Find two sets of paremetric representations and sketch the path.

  Modify the formulation by adding a constraint that whoever

1 consider the following lpmaximize z4x15x23x3subject tox1x24x3le6x12x2x3le7x14x22x3le12a add slack variables x4 x5 and

  What are normal project completion time and critical path

What is the optimal quantity per order based on the total annual cost (composed of the holding cost and ordering cost and the acquisition cost)? Provide the lowest total cost for each of the price options. Include the holding, ordering and acquisi..

  Problems based on correlation & regression

What are the managerial implications of a correlation between these variables?

  Explain why bernoulli process assumptions might not hold

Briefly explain why the Bernoulli process assumptions might not hold here - What is the largest value of n so that P(Y > 0) = 0.02?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd