Implement a simple k-means method, Applied Statistics

Assignment Help:

There exists an unclassified data set with hidden data structures in it. The task in this assignment is to perform comprehensive Cluster Analysis in order to reveal the structures and similar data groups.

1. Implement a simple K-means method, which is able to handle real values data in attributes. Also you need to add functionality in your program that allows utilization of Euclidean, City Block, Euclidean Squared and Chebyshev distances. You are free to use any kind of weights (for feature or data instance) in the program if necessary.

2. Find unlabeled data set test.txt and initial centroids data set centroids.txt in the archive, both files have the following format: [attribute1_value attribute2_value ... attribute90_value]. The unlabeled data set includes 350 samples and the initial centroids set consists of 15 samples. Data instances in both files have 90 attributes.


Related Discussions:- Implement a simple k-means method

Estimate the standard deviation of the process, Estimate the standard devia...

Estimate the standard deviation of the process: Draw the X (bar) and R charts for the data given and give your comments about the process under study. Estimate the standard de

Calculate the one year interest rate, A.The coupon rate of Erie-Chicago Rai...

A.The coupon rate of Erie-Chicago Rail is 7%. The interest rate of Florida municipal bond with equal risk is 6%.  At what tax rate the two bonds are as good as each other B.Supp

Rank correlation, Rank Correlation Sometimes the characteristics whose ...

Rank Correlation Sometimes the characteristics whose possible correlation is being investigated, cannot be measured but individuals can only be ranked on the basis of the chara

Empirical mode, Empirical Mode Where mode is ill-defined, its value may...

Empirical Mode Where mode is ill-defined, its value may be ascertained by the following formula based upon the empirical relationship between Mean, Median and Mode: Mode = 3

Explain ridge regression, Using log(x1), log(x2) and log(x3) as the predict...

Using log(x1), log(x2) and log(x3) as the predictors, do pair wise scatterplots of all pairs of variables (including the response) and comment (use the pairs function). Do you thin

Find the optimal adaptive meshes for a skewed beta density, Show that the I...

Show that the ISB in a bin containing the origin of the double exponen-tial density, f(x) = exp(-|x|)/2, is O(h 3 ); hence, the discontinuity in the derivative of f does not have a

Importance and application of probability, Importance and Application of pr...

Importance and Application of probability: Importance of probability theory  is in all those areas where event are not  certain to take place as same  as starting with games of

Median for ungrouped data, If the data set contains an odd number of items,...

If the data set contains an odd number of items, the middle item of the array is the median. If there is an even number of items, the median is the average of the two items. If the

Simple regression analysis, Construct your initial multivariate model by se...

Construct your initial multivariate model by selecting a dependent variable Y and two independent variables X. Clearly define what each variable represents and how this relates t

The incidence of occupational disease , The incidence of occupational disea...

The incidence of occupational disease in an industry is such that the workers have a 20% chance of suffering from it. What is the probability that out of six workers 4 or more will

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd