Outliers - reasons for screening data, Advanced Statistics

Assignment Help:

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.


Related Discussions:- Outliers - reasons for screening data

Homoscedasticity - reasons for screening data, Homoscedasticity - Reasons f...

Homoscedasticity - Reasons for Screening Data Homoscedasticity is the assumption that the variability in scores for a continuous variable is roughly the same at all values of

Dendro gram, A term commonly encountered in the application of the agglomer...

A term commonly encountered in the application of the agglomerative hierarchical clustering techniques, where it refers to the 'tree-like' diagram illustrating the series of steps

Biplots, Biplots: It is the multivariate analogue of the scatter plots, wh...

Biplots: It is the multivariate analogue of the scatter plots, which estimates the multivariate distribution of the sample in a few dimensions, typically two and superimpose on th

Banach''s match-box problem, Banach's match-box problem : The person carrie...

Banach's match-box problem : The person carries two boxes of matches, one in his left and one in his right pocket. At first they comprise N number of matches each. When the person

Bioinformatics, Bioinformatics : Essentially the application of the informa...

Bioinformatics : Essentially the application of the information theory to biology to deal with the deluge of the information resulting from the advances in molecular biology. The m

Factor scores, The values assigned to factors for the individual sample uni...

The values assigned to factors for the individual sample units in a factor analysis. The most common approach is "regression method". When the factors are seen as the random variab

Exponential order statistics model, The model which arises in the context o...

The model which arises in the context of estimating the size of the closed population where individuals within the population could be identified only during some of the observatio

Explain kurtosis, Kurtosis: The extent to which the peak of the unimodal p...

Kurtosis: The extent to which the peak of the unimodal probability distribution or the frequency distribution departs from its shape of the normal distribution, by either being mo

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd