Reference no: EM132743904
Assignment - Cluster Analysis
Instructions:
1.Please submit single R-code file for all questions along with the written assignment. 2.Interpretation of results is as important as R-code. A code without interpretation would
lead to deduction in marks.
3. Name your files as follows "FIRSTNAME A2"
4. Discussion among the group is allowed, however copying is not. Copying would invite penalty for all the parties involved.
5. If there are any doubts, do not assume, write an email and clarify.
6. Reading documentation of hclust package in R is strongly recommended before attempt- ing the assignment.
Q1.Using the following data:
Cluster
|
A1
|
A2
|
C1
|
30
|
10
|
C2
|
20
|
15
|
C3
|
15
|
5
|
C4
|
20
|
10
|
C5
|
5
|
20
|
a.Make clusters using Euclidean distance measure and Complete Linkage Method (Detailed calculation as done in class is required, you may use R to compute the dis- tance matrix. However, a manual calculation of distance between any two clusters to be illustrated, if R is used for computing distance matrix)
b.Represent the result with the help of Dendogram. Verify your answer by coding in
R. (R code file to be submitted)
Q2.Install the package "cluster.datasets" in R.
a.Load the data set "airline.distances.1966" 1.Describe the data set in 4-5 lines.
2. Do cluster analysis using Single Linkage Method and interpret your results
3. Do cluster analysis using Complete Linkage Method and interpret your results
4. Compare the results from exercise (1) and (2) [
Bonus Question
Do cluster analysis using Centroid Method and interpret your results (this is a bonus question and fetches 3 marks, if correct. You may choose not to attempt it)
b.Load the data set "nutrients.meat.fish.fowl.1959" 1.Describe the data set in 4-5 lines.
2. Choose the number of clusters at the beginning based on Elbow method.
3. Use k-means clustering algorithm to partition this data set.
Q3.Use "Nike.xls" data set
a.Cluster the respondents based on the identified variables using hierarchical cluster- ing. Use Ward's method and squared euclidean distances. How many clusters do you recommend and why?
b.Cluster the respondents based on the identified variables using k-means clustering and the number of clusters identified in a. Compare the results to those obtained in (a).
Attachment:- Cluster Analysis.rar