K-means cluster analysis, Advanced Statistics

Assignment Help:

K-means cluster analysis is the method of cluster analysis in which from an initial partition of observations into K clusters, each observation in turn is analysed and reassigned, if suitable, to a different cluster in an attempt to optimize some predefined numerical criterion that measures in some sense the 'quality' of cluster solution. Several such clustering criteria have been suggested, but the most usually used arise from considering the features of the within groups, between groups and whole matrices of sums of squares and the cross products (W, B, T) which can be described for every partition of the observations into the particular number of groups. The two most ordinary of the clustering criteria developing from these matrices are given as follows

minimization of trace W

minimization of determinant W

The first of these has tendency to produce the 'spherical' clusters, the second to produce clusters that all have same shape, though this will not necessarily be spherical in shape. 

 


Related Discussions:- K-means cluster analysis

Mardia''s multivariate normality test, Mardia's multivariate normality test...

Mardia's multivariate normality test is a test that a set of the multivariate data arise from the multivariate normal distribution against departures due to the kurtosis. The test

Quasi-experiment, Quasi-experiment is a term taken in use for studies whic...

Quasi-experiment is a term taken in use for studies which resemble experiments but are weak on some of the characteristics, particularly that allocation of the subjects to groups

Residual plots, Residual plots are the plots of some type of residual whi...

Residual plots are the plots of some type of residual which might be helpful in assessing the assumption made by the fitted model. In regression analysis there are various method

Rates of return, An investor with a stock portfolio sued his broker, claimi...

An investor with a stock portfolio sued his broker, claiming that a lack of diversification in his portfolio had led to poor performance. The data, shown below, are the rates of re

Quittingill effect, Quittingill effect is a  problem which occurs most fre...

Quittingill effect is a  problem which occurs most frequently in studies of the smoker cessation where smokers frequently quit smoking following the onset of the disease symptoms

Bivariate survival data, Bivariate survival data : The data in which the tw...

Bivariate survival data : The data in which the two related survival times are of interest. For instance, in familial studies of disease incidence, data might be available on the a

T-test , Ha: If hyperlipidemia is believed to be a side effect of second-ge...

Ha: If hyperlipidemia is believed to be a side effect of second-generation antipsychotics (SGAs), then Hispanic patients with SGAs treatment will have the higher frequency of devel

Confidence profile method, Confidence profile method : A Bayesian approach ...

Confidence profile method : A Bayesian approach to meta-analysis in which the information in each piece of the evidence is captured in the likelihood function which is then used al

Residual calculation, Regression line drawn as y= c+ 1075x ,when x was2, an...

Regression line drawn as y= c+ 1075x ,when x was2, and y was 239,given that y intercept was 11. Calculate the residual ?

Gene environment interaction, The interplay of the genes and environment on...

The interplay of the genes and environment on, for instance, the risk of disease. The term represents the step away from the argument as to whether the nature or nurture is the pre

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd