kMeans and Fuzzy Clustering Assignment

Assignment Help Other Subject

Reference no: EM132462289

Unsupervised Learning - kMeans and Fuzzy Clustering

INSTRUCTIONS:

You will need to write several distinct programs beyond k-Means to accomplish the tasks set out below. Provide pseudocode for each in the appropriate sections of your report.

Part I. Visualizing (a slice of) a WCSSE 5-D Error-Surface

Use the 17 data-instances from the demo kmeans Excel file seen in class (HW3_S20_kmeans.xls) for this first exercise (i.e. cells B3:C19 in kmeans tab). The first datapoint should be: (0.62, 0.98). If you have overwritten the original data in the course of your explorations, re-download the file to use the same dataset (and get the same results) as everyone else!

Generate a 4D array of size 11 x 11 x 11 x 11. Note: indices represent locations of the two cluster-heads!

Each of the four integer indices should range in value from 0 to 10, mapping to the 0 - 1 range (i.e. in steps of size 0.10) of the k-means graph-axes in the Excel demo. The following generic code-example illustrates saving a WCSSE value of 9.83 for a 2-cluster solution consisting of cluster-heads (0.1, 0.2) and (0.8, 0.5) to such an array:

surfArray[1][2][8][5] = 9.83

Write a function (possibly named: getWCSE(data[], cheads[])) to generate every possible Within-Cluster Sum of Squared Distance/Errors (WCSSE) value for each of the 11^4 possible 2-cluster solutions afforded by this array.

Question 1. Display an Excel 3D Surface chart of the error-surface function w.r.t. indices (x1, y1), keeping the other centroid constant at (0,0)
• That is, for all values surfArray[x1][y1][0][0]

Question 2. Display an Excel 3D Surface chart of the error-surface function w.r.t. indices (x2, y2), keeping the other centroid constant at (10, 10)
• That is, for all values surfArray[10][10][x2][y2]
Explain the meaning of these two surface-graphs. How are they different?
Are they symmetric with respect to each other? Why or why not?
Do either of them hint at a possible solution near (0.2, 0.6) or (0.8, 0.56)?

Question 3. Display an Excel 3D Surface chart of the error-surface function w.r.t. indices (x1, y1) keeping the other centroid constant at (8, 6)
• That is, for all values surfArray[x1][y1][8][6]

Question 4. Display an Excel 3D Surface chart of the error-surface function w.r.t. indices (x2, y2) keeping the other centroid constant at (2, 6)
• That is, for all values surfArray[2][6][x2][y2]
Explain the meaning of these surface-graphs.

You know what, let me show some screenshots of what 2 of these WCSSE surfaces should look like -- if that might be helpful to you...

(0,0, x2, y2):

(10,10, x2, y2):

Part II. Clustering an Anonymous Dataset

Find the HW3_data.csv file in the class INFO folder. It contains 120 4-D instances to be clustered by your k-Means implementation.

Question 1. Implement the k-Means Algorithm as demonstrated by the in-class Excel demo.

Possible interface: kmeans(data[], cheads[])

Don't forget the heuristic for dealing with empty cluster sets. Your algorithm should output a clustering solution for the input data provided, as well as the final coordinates of the K centroids, and the WCSSE value of the solution.

-- provide a pseudocode for your k-means algorithm in your report

Question 2. Determine the most natural value of k (i.e. BestK) to be used for this dataset. Use the average WCSE of at least 10 trials for each value of K from 1 - 10.
-- show a graph of your BestK trial to support your choice of BestK
-- provide a solid argument for why your chosen BestK value seems appropriate for this dataset

Question 3. Generate clustering solutions for at least 10 randomly-initialized sets of centroids using your BestK value determined previously.
-- display the WCSE values in a table, highlighting the best one
-- display a 2-D scatterplot of your best k-means clustering solution (you might use attributes attC and attD for the axes, though you may choose whichever two look best to you), including final centroid positions. Color-code the datapoints according to their cluster.

Question 4. Find the HW3_train.csv file, which has the actual classifications of the datapoints (i.e. the Variable class column). If you did not find K=3 to be your BestK previously, generate your best K=3 clustering solution now. Using only your final set of three centroid coordinates, conduct a nearest-neighbor classification of the training set (ignoring the Variable column for this process).
-- assuming you cluster instances into clusters named Cluster_1, Cluster_2, etc. determine the value of my variables A, B etc. by what class the majority of your cluster-members fall into. For example, if Cluster_x has 50 members, 30 of which are actually class A, 18 are B and 2 is C -- then your Cluster_x has gravitated toward the concept of my class A. Show me the translations of your clusters to my classes (Note: this mapping will be different from one student's results to another's)
-- determine the accuracy-rate of your clustering solution to the actual classifications of the training set.

Find the HW3_test.csv file. Classify the 30 instances therein according to their nearest centroid in your K=3 solution.
-- determine the accuracy-rate of your clustering-solution to the actual classifications of the testing set

-- what do your accuracy results suggest about the effectiveness of the k-Means algorithm for this particular classification task? Do you think it is possible for k-Means to correctly classify these instances at the 100% level? 90% level? 80%? Explain.

Question 5. Implement the Fuzzy Clustering (FC) algorithm as described in class and as illustrated within the kMeans Excel file example. Repeat section D above, but now using the FC algorithm. Use the BestK determined from kMeans above as the number of clusters for this solution (Note: you should use the "more-complicated" of the two Expectation formulas from the class Powerpoint slide)
-- state the pseudocode of your implementation of this algorithm. What is your stopping condition? Why do you think this is reasonable?
-- when run multiple times, is there more, or less variety to the solutions you generate (Show me in a table)? Why is this so?
-- are your training/testing accuracy-rates better? Why do you suppose this is (or is not) so?

Graph and color-code your best FC solution for the training set.

Attachment:- Data Mining.rar

Reference no: EM132462289

Questions Cloud

Compute the debt ratio and equity ratio : Compute the gross margin ratio (both with and without services revenue) and net profit margin ratio (round the percent to one deciamal).

What is Katy basis in the stock of Perry : Katy owns property with a basis of $100,000 and a fair market value of $280,000, What is Katy basis in the stock of Perry

How much total interest will coronado pay on payment plan : How much total interest will Coronado pay on this payment plan? Coronado to make 11 equal annual payments of $13,321.27, with the first payment due one year

Estimate the annual cost savings : Inventory carrying costs are estimated to be 15% per year. Estimate the annual cost savings as a result of the quality improvement

kMeans and Fuzzy Clustering Assignment : Unsupervised Learning - kMeans and Fuzzy Clustering Assignment Help and Solution - Implement the k-Means Algorithm as demonstrated by the in-class Excel.

What do the initials XBRL represent : What do the initials "XBRL" represent? Why is XBRL needed in the financial community? In your opinion, why would the Securities

How will sampling risk impact the audit process : What do each of these mean and how will each impact the audit process and the audit results? Which do you consider the more important risk?

Calculate department predetermined overhead allocation rates : Calculate the department predetermined overhead allocation rates for the Assembling and Packaging Departments

Explain why the cost of treatment may fall over time : Explain why the cost of treatment may fall over time, and how this may influence decisions as to whether or not to adopt a new treatment that is more effective

Reviews

len2462289

2/24/2020 1:37:48 AM

Hi, I would like to know how much this assignment will cost and can i get it on wednesday. Can I get it the answers in a word document file for every question and the excel files so i can understand on how it was done.Thank you and have a great evening.

Write a Review

Required(*) Message

User Account

All Pages