Outliers - reasons for screening data, Advanced Statistics

Assignment Help:

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.


Related Discussions:- Outliers - reasons for screening data

Explain influence statistics, Influence statistics: The range of statistic...

Influence statistics: The range of statistics designed to assess the effect or the in?uence of an observation in determining results of the regression analysis. The general approa

Describe hello-goodbye effect., Hello-goodbye effect : The phenomenon initi...

Hello-goodbye effect : The phenomenon initially described in psychotherapy research, but one which might arise whenever a subject is assessed on two occasions, with some interventi

Dorfman scheme, An approach to investigations designed to recognize a parti...

An approach to investigations designed to recognize a particular medical condition in the large population, usually by means of a blood test, which might result in the considerable

Residual calculation, Regression line drawn as y= c+ 1075x ,when x was2, an...

Regression line drawn as y= c+ 1075x ,when x was2, and y was 239,given that y intercept was 11. Calculate the residual ?

Mann whitney test, Mann Whitney test is a distribution free test which is ...

Mann Whitney test is a distribution free test which is used as an alternative to the Student's t-test for assessing that whether the two populations have the same median. The test

Case-cohort study, Case-cohort study : The research design in epidemiology ...

Case-cohort study : The research design in epidemiology which involves the sampling of controls at the outset of the study that is to be compared with the cases from the cohort. Th

Chains of infection, Chains of infection : The description of the course of...

Chains of infection : The description of the course of infection among the group of individuals. The susceptibles infected by the direct contact with the introductory cases are sai

#title.Statistics for management, The growth in bad debt expense for Johnst...

The growth in bad debt expense for Johnston office supply Company over this time period.If this rate continues,estimate the percentage increase in bad debts for 1997,relative to 19

Bayes factor, Bayes factor : A summary of evidence for the modelM1 against ...

Bayes factor : A summary of evidence for the modelM1 against the another modelM0 provided by the set of data D, which can be used in the model selection. Given by the ratio of post

Machine learning, Machine learning  is a term which literally means the ab...

Machine learning  is a term which literally means the ability of a machine to recognize patterns which have occurred repetitively and to improve its performance based on the past

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd