Estimate a bernoulli naive bayes classifer

Assignment Help Database Management System
Reference no: EM13923919

Q1.

Based on the data in the following table,

(1) estimate a Bernoulli Naive Bayes classifer (using the add-one smoothing)

(2) apply the classifier to the test document.

(3) estimate a multinomial Naive Bayes classifier (using the add-one smoothing)

(4) apply the classifier to the test document

You do not need to estimate parameters that you don't need for classifying the test document.


docID words in document Class = China?
training set 1  Taipei Taiwan  Yes

2 Macao Taiwan Shanghai Yes

3 Japan Sapporo NO

4 Sapporo Osaka Taiwan NO 
test set 5 Taiwan Taiwan Taiwan Sapporo Bangkok ?

Q2:

Algorithm 1: k-means(D, k)
Data: D is a dataset of n d-dimensional points; k is the number of clusters.
1 Initialize k centers C = [c1, c2, . . . , ck ];
2 canStop ← false;
3 while canStop = false do
4 Initialize k empty clusters tt = [g1, g2, . . . , gk ];
5 for each data point p ∈ D do
6 cx ← NearestCenter(p, C);
7 gcx .append(p);
8 for each group g ∈ G do
9 ci ← ComputeCenter(g);
10 return G;

Consider the (slightly incomplete) k-means clustering algorithm as depicted in Algorithm 1.

(1) Assume that the stopping criterion is till the algorithm converges to the final k clusters. Can you insert several lines of pseudo-code after Line 8 of the algorithm to implement this logic.

(2) The cost of k clusters

cost(g1, g2, . . . , gk) = ki=1 cost(gi)

where cost(gi) = ∑p∈gi dist(p, ci). dist() is the Euclidean distance. Now show that the cost of k clusters as evaluated at the end of each iteration (i.e., after Line 11 in the current algorithm) never increases. (You may assume d = 2)

(3) Prove that the cost of clusters obtained by k-means algorithm always converges to a local minima. (Hint: you can make use of the previous conclusion even if you have not proved it).

Q3. Consider the given similarity matrix. You are asked to perform group average hierarchical clustering on this dataset.

You need to show the steps and final result of the clustering algorithm. You will show the final results by drawing a dendrogram. The dendrogram should clearly show the order in which the points are merged.

 

p1

p2

p3

p4

p5

p1

1.00

0.10

0.41

0.55

0.35

p2

0.10

1.00

0.64

0.47

0.98

p3

0.41

0.64

1.00

0.44

0.85

p4

0.55

0.47

0.44

1.00

0.76

p5

0.35

0.98

0.85

0.76

1.00

Q4. Play several rounds of the Akinator game at https://au.akinator.com/.

(1) It is not uncommon that users may give completely or partially wrong answers during a game. Assume the site maintains a large table, where each row is about a person, and each column is a Boolean-type question, and each cell value is the correct answer ("Yes" or "No"), and that the core algorithm the site uses is a decision tree. To accommodate possible errors, let's assume the site allows up to one error in a game. That is, a person will still be a candidate if at most one question answer the user provided does not match the correct answer in the data table. Now describe how you will modify the ID3 decision tree construction algorithm to build a decision tree for the site while allowing up to one error in a game.

(2) Assume that you do not think the site uses decision trees as the backbone algo- rithm. What are the reason(s) to support this conjecture? You may list more than one reason. If you design some experiments and will refer to them, please include the setup and the details of the experiments (e.g., something like Figure 1)

Q5.

We consider the linear counting estimator that estimates the number of distinct elements in a data stream. Using this as a building block, we shall derive methods to estimate the number of distinct elements after some common set operations on several data streams.

Let S1 and S2 be two data streams1, and C(Si) be the linear counting estimator for Si using the same hash function h() and same length of bit array (i.e., using m bits and the bit array is denoted as C(Si).B).

(1)Prove that C(S1 ∪ S2) = C(S1) ∨ C(S2). Here ∪ is the multiset union operator, and the ∨ operator on two linear counting estimators C1 and C2 returns a new estimator (with the same hash function) with a m-bit bit array where its j-th entry is the result of bitwise OR of the corresponding bits in C1 and C2, i.e., C1.B[j] | C2.B[j]. (2)Prove that C(S1 ∩ S2) ≠ C(S1) ∧ C(S2). Here ∩ is the multiset intersection operator, and the ∧ operator is defined similar to ∨ except that we use bitwise AND instead of bitwise OR, i.e., C1.B[j] & C2.B[j].

(3) Derive a method to estimate the number of distinct elements in S1 ∩ S2, based only on linear counting estimators.

Reference no: EM13923919

Questions Cloud

Do correct for european commission to restrict mergers : Do you think it is correct for the European Commission to restrict mergers between American companies that do business in Europe?
What rate of interest should the corporate bond pay : You are considering an investment in a U.S. corporate bond but you are not sure what rate of interest it should pay. Assume that the real risk-free rate of interest is 1.0%; inflation is expected to be 2.0%; the maturity risk premium is 1.5%; and, th..
Cell theory most directly violated by slime molds : Biologists have discovered living things that defy (do not agree with) "Cell Theory". Which aspect of the cell theory is most directly violated by slime molds?
What is the companys expected growth rate : WACC and Cost of Common Equity Kahn Inc. has a target capital structure of 65% common equity and 35% debt to fund its $10 billion in operating assets. Furthermore, Kahn Inc. has a WACC of 16%, a before-tax cost of debt of 8%, and a tax rate of 40%. W..
Estimate a bernoulli naive bayes classifer : Estimate a Bernoulli Naive Bayes classifer and apply the classifier to the test document - You are asked to perform group average hierarchical clustering on this dataset.
What is the after-tax cash flow from the sale of equipment : Quick Computing installed its previous generation of computer chip manufacturing equipment 3 years ago. Some of that older equipment will become unnecessary when the company goes into production of its new product. What is the after-tax cash flow fro..
Is smith an employee or an independent contractor : Tom Smith was hired to drive an airport shuttle for a rental car company back and forth from the airport to the rental car company's off-site parking lot. When Smith was hired, he signed a written contract that stated specifically that he was an i..
Using better inventory control systems : Better Mousetraps has developed a new trap. It can go into production for an initial investment in equipment of $6.0 million. The equipment will be depreciated straight line over 6 years to a value of zero, but in fact it can be sold after 6 years fo..
Optimal capital structure consists : Klose Outfitters Inc. believes that its optimal capital structure consists of 60% common equity and 40% debt, and its tax rate is 40%. Klose must raise additional capital to fund its upcoming expansion. The firm will have $3 million of retained earni..

Reviews

Write a Review

Database Management System Questions & Answers

  What is a data warehouse and why is rei building one

What is a data warehouse and why is REI building one and What are some of the risks or concerns surrounding the creation of a data warehouse

  What is the two-phase locking protocol

What is the two-phase locking protocol and what is the strict two-phase locking protocol? What is the rigorous two-phase locking protocol? What benefit does strict two-phase locking protocol provide? What benefit does rigorous two-phase locking pr..

  Describes the movement of inventory

Create a decision table that describes the movement of inventory and draw a decision tree that describes the merchandise inventory management process.

  Brief synopsis analyzing the detailed requirements

Provide a brief synopsis analyzing the detailed requirements of your prototype database design and design a database prototype that includes diagrams, data dictionary, design decisions, limitations, etc.

  Describe a dbms and its functions

Describe a DBMS and its functions. Name some of the popular DBMS software and you should search the Internet for the updated DBMS technology.

  Create a relational schema

There are errors in the Company Schema. Identify all of the errors. Using ER Diagramming notation, diagram a relational model that is capable of modeling a e-commerce-based company. It should include the following information

  Return a result table sorted by patient name

Return a result table that could be used to produce a report, sorted by patient name or date, for a particular week or after a particular date, or a listing of patient visits for patients assigned to a specific social worker

  Create a database

Create a database that implements the proposed data warehouse schema.

  United broke artists (uba) is a broker

United Broke Artists (UBA) is a broker for not-so-famous artists. UBA maintains a small database to track painters, paintings, and galleries. A painting is created by a particular artist and then exhibited in a particular gallery

  How referential integrity constraint prevent data

In physical database design, referential integrity constraints can be defined. What actions does referential integrity constraint prevent from happening when data is inserted in table which contains this constraint?

  What type of databases and database servers myspace use

What kind of databases and database servers does MySpace use? Why is database technology so important for a business such as MySpace?

  Define a structural model

Define a structural model. Why should a systems analyst create one? Give an example of class cohesion for a class named SUPPLIER for your example that supplies car parts. List some of its attributes and at least two OPERATIONS (methods) that would..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd