Threshold used to find cluster density remains constant

Assignment Help Basic Computer Science
Reference no: EM132727123

1. In CLIQUE, the threshold used to find cluster density remains constant, even as the number of dimensions increases. This is a potential problem since density drops as dimensionality increases; i.e., to find clusters in higher dimensions the threshold has to be set at a level that may well result in the merging of low-dimensional clusters. Comment on whether you feel this is truly a problem and, if so, how you might modify CLIQUE to address this problem.

2. Name at least one situation in which you would not want to use clustering based on SNN similarity or density.

3. Give an example of a set of clusters in which merging based on the closeness of clusters leads to a more natural set of clusters than merging based on the strength of connection (interconnectedness) of clusters.

4. We take a sample of adults and measure their heights. If we record the gender of each person, we can calculate the average height and the variance of the height, separately, for men and women. Suppose, however, that this information was not recorded. Would it be possible to still obtain this information? Explain.

5. Explain the difference between likelihood and probability.

6. Traditional K-means has a number of limitations, such as sensitivity to outliers and difficulty in handling clusters of different sizes and densities, or with non-globular shapes. Comment on the ability of fuzzy c-means to handle these situations.

7. Clusters of documents can be summarized by finding the top terms (words) for the documents in the cluster, e.g., by taking the most frequent k terms, where k is a constant, say 10, or by taking all terms that occur more frequently than a specified threshold. Suppose that K-means is used to find clusters of both documents and words for a document data set.

(a) How might a set of term clusters defined by the top terms in a document cluster differ from the word clusters found by clustering the terms with K-means?

(b) How could term clustering be used to define clusters of documents?

8. Suppose we find K clusters using Ward's method, bisecting K-means, and ordinary K-means. Which of these solutions represents a local or global minimum? Explain.

9. You are given a data set with 100 records and are asked to cluster the data. You use K-means to cluster the data, but for all values of K, 1 ≤ K ≤ 100, the K-means algorithm returns only one non-empty cluster. You then apply an incremental version of K-means, but obtain exactly the same result. How is this possible? How would single link or DBSCAN handle such data?

 

Reference no: EM132727123

Questions Cloud

Developing publicly accessible cloud-based application : Your team of international developers will be developing a publicly accessible cloud-based application which may potentially house user PII data,
Calculate the day-by-the-hour results and the variance : Day's production, one cell had completed 6 good trucks, including one truck that was reworked. Calculate the day-by-the-hour results and the variance.
Discussion about the employer-employee relationship : For this assignment, you will search the CSU Online Library for an article that addresses how terminating the employer-employee relationship can be one of the.
Calculate the takt time minutes per truck : Calculate the takt time _____ minutes per truck. Concord Company manufactures toy trucks. Orders for 8,700 trucks were received during March
Threshold used to find cluster density remains constant : In CLIQUE, the threshold used to find cluster density remains constant, even as the number of dimensions increases.
Stack overflow hosts an annual survey for developers : Stack Overflow hosts an annual survey for developers. The study for 2019 includes almost 90,000 respondents (Stack Overflow, n.d.a).
What is marshall outside basis at the end of the year : What is Marshall's outside basis at the end of the year? Ted and Marshall formed Ahoyr Partnership in 2009. They are equal partners.
Did you think the interviewers had enough information : Think back to an interview that you had interviewing for a job. When you finished the interview, did you think the interviewers had enough information about.
Which would be included in the journal entry to record : All sales are on account and are subject to a sales tax of 10%. Which of the following would be included in the journal entry to record the sales transaction?

Reviews

Write a Review

Basic Computer Science Questions & Answers

  How is the propensity for a positive response calculated

Voting ensemble models always perform better than any of their constituent classifiers.

  How to code a workable digital prototype in gamemaker

Demonstrate how to code a workable digital prototype in GameMaker based on some requirements obtained from user stories

  Specifications of the company request

Write a program to meet the specifications of the company's request. The program should have the following characteristics:

  How To Hide The IP Address

How To Hide The IP Address? Hide your identity from your competitors. Hide your geographic location. Prevention of tracking by site owners

  Describe how a security test differs from a security audit

1. Describe how a security test differs from a security audit? 2. Describe why a black test requires more expertise than a white box test?

  Give an algorithm with running time o(m + n)

Suppose that an n-node undirected graph G = (V, E) contains two nodes s and t such that the distance between s and t is strictly greater than n/2. Show that there must exist some node v, not equal to either s or t, such that deleting v from G dest..

  Which will have the greatest impact on the back-work ratio

hich will have the greatest impact on the back-work ratio: a compressor isentropic efficiency of 80 percent or a turbine isentropic efficiency of 80 percent? Use constant specific heats at room temperature.

  Find out how artists can have their products listed

choose the websites from the list that you are least familiar with. Each group member must have a different website - Find out the copyright information

  What is nat and what is it used for?

What is NAT and what is it used for?

  Experiences of other firms dealing with crises

With the second crash of Boeing 737 - Max leading to more than a hundred people losing lives, Boeing is facing an unprecedented challenge it has never faced

  Budget deficit by making across-the-board cuts

Assume the economy is initially in a long-run equilibrium. Suppose Congress and the President decide to reduce the budget deficit by making 10% across-the-board cuts to every government program tomorrow (and for the record, 10% is a very big numbe..

  System network administrator

You are the system network administrator for the Can-D company, which is an organization of 3,000 employees working from a large corporate campus in sunny Orlando, Florida.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd