Implement a simple k-means method, Applied Statistics

Assignment Help:

There exists an unclassified data set with hidden data structures in it. The task in this assignment is to perform comprehensive Cluster Analysis in order to reveal the structures and similar data groups.

1. Implement a simple K-means method, which is able to handle real values data in attributes. Also you need to add functionality in your program that allows utilization of Euclidean, City Block, Euclidean Squared and Chebyshev distances. You are free to use any kind of weights (for feature or data instance) in the program if necessary.

2. Find unlabeled data set test.txt and initial centroids data set centroids.txt in the archive, both files have the following format: [attribute1_value attribute2_value ... attribute90_value]. The unlabeled data set includes 350 samples and the initial centroids set consists of 15 samples. Data instances in both files have 90 attributes.


Related Discussions:- Implement a simple k-means method

Factor loadings matrix, As we stated above, we start factor analysis with p...

As we stated above, we start factor analysis with principal component analysis, but we quickly diverge as we apply the a priori knowledge we brought to the problem. This knowled

Determine the closed loop speed transfer function, In the case of permanent...

In the case of permanent magnet DC motor whose stator consists of a permanent magnet we can take the field current to be constant (i.e. a constant magnetic field) and it can be sho

WEDGE FRICTION, Ask question #MinimumA wedge is small piece of material hav...

Ask question #MinimumA wedge is small piece of material having two of their opposite faces not parallel. To lift block of weight W, it is pushed by horizontal force P which lifts t

Estimation error on apparent arbitrage, This question explores the effect o...

This question explores the effect of estimation error on apparent arbitrage opportunities in a controlled simulation setting. We simulate returns for N = 10 assets over T = 30 year

Box plot of income, The box plot displays the diversity of data for the inc...

The box plot displays the diversity of data for the income; the data ranges from 20 being the minimum value and 1110 being the maximum value. The box plot is positively skewed at 4

Explain the central tendency, Explain what central tendency and variability...

Explain what central tendency and variability are. In your answer define what the mean, median, mode, variance, and standard deviation are. What is the difference between the descr

Professional Counselor, A researcher hypothesized that the pulse rates of ...

A researcher hypothesized that the pulse rates of long-distance athletes differ from those of other athletes. He believed that the runners’ pulses would be slower. He obtained a ra

Normal curve applications, Replacement times for TV sets are normally distr...

Replacement times for TV sets are normally distributed with a mean of 8.2 years and a standard deviation of 1.1 years. Find the replacement time that separates the top 20% from the

Regression model, A real estate agency collected the data shown below, wher...

A real estate agency collected the data shown below, where           y  = sales price of a house (in thousands of dollars)           x 1 = home size (in hundreds of square f

Admissibility, Admissibility A very common concept which is applicable ...

Admissibility A very common concept which is applicable to any procedure of the statistical inference. The underlying notion is that the procedure/method is admissible if and o

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd