List of the aliments and their cluster membership

Assignment Help Humanities
Reference no: EM131058156

Question 1

Get the dataset "food.txt" from GauchoSpace and read it with R. Alternatively you can download this data set from the library cluster.datasets with the following code:

library(cluster.datasets)
data(nutrients.meat.fish.fowl.1959)
The Data Set contains the quantity of Energy, Protein, Fat, Calcium and Iron of 27 differen aliments.

The task here is to finding meaningful clusters in the data. To this end perform the following:
1. Find clusters using a K-means algorithm. Try out different values of K and determine your best best solution. The number of clusters you choose should be based either on appropriate measures of fit, for example SSE as defined in the book IDM, and interpretability of the results. For each value of K that you try out provide:

a. the centroids
b. the size of each cluster and a list of the aliments and their cluster membership
c. the ratio between-SS/total-SS
d. a meaning (use your imagination) to each cluster formed, e.g. what are the summarizing characteristics of the aliments in group 1?
e. to answer part d above you might find useful using a parallel coordinate plot of the centroids
2. Apply hierarchical clustering using min, max and average distances (respectively single, complete and average methods in R).
a. For each method produce a dendrogram with the labels of the aliments
b. What are the differences, in any, in using the three different measures of distances?
c. Can you individuate clusters similar to those obtained by K-means clustering?

Additional exercises for PStat 231
Question 2
Perform PCA of the food.txtdata and use a biplot to visualize the first two PC and the Variables. Based on the biplot one could still individuate groups (clusters) of aliments with similar characteristics.

a. Is the grouping obtained by PCA similar or different from that obtained by the clustering algorithms above? Explain with some detail.
b. Which technique do you find most useful in describing the data set? Why?
1
Question 3
Suppose that we have four observations, for which we compute a dissimilarity matrix, given by

0.3 0.4 0.7
0.3 0.5 0.8
0.4 0.5 0.45
0.7 0.8 0.45
For instance, the dissimilarity between the first and second observations is 0.3, and the dissimilarity between the second and fourth observations is 0.8.
a. On the basis of this dissimilarity matrix, sketch the dendrogram that results from hierarchically clustering these four observations using complete linkage. Be sure to indicate on the plot the height at which each fusion occurs, as well as the observations corresponding to each leaf in the dendrogram.

b. Suppose that we cut the dendogram obtained in (a) such that two clusters result. Which observations are in each cluster?

Reference no: EM131058156

Questions Cloud

Design a database for an automobile company : Design a database for an automobile company to provide to its dealers to assist them in maintaining customer records and dealer inventory and to assist sales staff in ordering cars.
Identify specific environmental stewardship activities : This can include the removal of exotic species, trail repair, etc. Also, there are environmental groups that identify specific environmental stewardship activities that need volunteers to help pick up trash, plant trees, etc.
Design a database for a world-wide package delivery company : The database must be able to keep track of customers (who ship items) and customers (who receive items); some customers may do both.
Conduct a critical literature review of your research topic : What have researchers said about your research topic? What types of studies have they done, and what have been the findings and what epistemological perspectives have served as the foundation for these studies?
List of the aliments and their cluster membership : Get the dataset "food.txt" from GauchoSpace and read it with R. Alternatively you can download this data set from the library cluster.datasets with the following code:
Mean life expectancy : The U.S. Center for Disease Control reports that the mean life expectancy was 47.6 years for whites born in 1900 and 33.0 years for nonwhites. Suppose that you randomly survey death records for people born in 1900 in a certain county.
Question regarding the sample proportion : Find the test statistic that would be used for a test of H0: p = 0.3 versus Ha: p ≠ 0.3, given a sample proportion of 0.35 from a sample size of 200.
Design a database for an airline : Your design should include an E-R diagram, a set of relational schemas, and a list of constraints, including primary-key and foreign-key constraints.
How does seniority play a role in how overtime is scheduled : If an overtime list is created, how should it be managed since there are certain workers qualified for some tasks but not others? Should there be several task specific lists created, or an overall shop list? If a listed is created for overtime, ma..

Reviews

Write a Review

Humanities Questions & Answers

  Many situations can raise ethical concerns particularly in

short answer questions. 150-250 words per each with references.1. many situations can raise ethical concerns

  Compare between two cities dubai-united arab emirates

Compare between two cities ( Dubai-United Arab Emirates) & (shenchen- China). how both cities developed and improved economically. details and information about both cities,background. 10 pages essay with resources

  Compare differing conceptualizations of the mind

Write an essay in which you make a statement and provide support for whether the mind and brain are fully separate or whether they are one entity.

  What is the most compelling part of his argument

Arquilla suggests that ignoring the rise of networks will lead to more military interventions and will cause states "to focus more on confrontation and co-optation than on embracing this new form of social organization" (Arquilla 2007, 207).

  Prepare a 12-15 slide presentation using microsoft

prepare a 12-15 slide presentation using microsoft powerpoint. in your presentation explain how families affect the

  What is the primary purpose of perimeter security

What is the primary purpose of perimeter security? Of the three types of perimeter security, physical, psychological, and technological, which do you feel is most important? Why

  Discuss the differences between an indictment and an

discuss the differences between an indictment and an information. what are the merits and downfalls of each? which

  1how does the ledbetter case highlight the challenges of

1.how does the ledbetter case highlight the challenges of legislating implementing and litigating pay equity in the

  How did the emergence of television affect american culture

How did the emergence of television affect american culture in the 1950?

  You are require to summarize an article that focuses on a

you are require to summarize an article that focuses on a single social issues or problem regarding a social

  What are man-made risks

What actions can management take in a company to prevent workplace violence? Do you believe a warning signs are typically present prior to a violent action occurring? Explain.

  Reflecting on the effect of your education in philosophy so

2 pages double spaced 12 pt. font with 1 inch marginsreflecting on the effect of your education in philosophy so far

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd