Outliers - reasons for screening data, Advanced Statistics

Assignment Help:

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.


Related Discussions:- Outliers - reasons for screening data

Ain why the simulated result doesn''t have to be exact as the, ain why the ...

ain why the simulated result doesn''t have to be exact as the theoretical calculation

Find distribution - expected value and variance, We are installing a router...

We are installing a router for our network. We believe that the time between the arrival of packets will be exponentially distributed with parameter R = 2 packets/second, and th

Population averaged models, Population averaged models are the models for ...

Population averaged models are the models for kind of clustered data in which the marginal expectation of response variable is the main focus of interest. An alternative approach

Regression, regression line drawn as Y=C+1075x, when x was 2, and y was 239...

regression line drawn as Y=C+1075x, when x was 2, and y was 239, given that y intercept was 11. calculate the residual

Density estimation, Procedures for estimating the probability distributions...

Procedures for estimating the probability distributions without supposing any particular functional form. Constructing the histogram is perhaps the easiest example of such type of

Pasture trials, Pasture trials is the study in which the pastures are subj...

Pasture trials is the study in which the pastures are subjected to number of treatments (types of forage, animal management systems, agronomic treatments, and many more)The grazin

Alternative hypotheses and spss calculation, 1) Question on the first day q...

1) Question on the first day questionnaire asked students to rate their response to the question Are you deeply moved by the arts or music? Assume the population that is sampled

Expectaton, sales per day for a product are as follows: x= 10, 11, 12, 13 (...

sales per day for a product are as follows: x= 10, 11, 12, 13 (p)= 0.2, 0.4, 0.3, 0.1 obtain mean and variance of daily sale. if the profit is described by the following equation p

Public network, This is given by common network e.g. Phone Company. The pub...

This is given by common network e.g. Phone Company. The public networks are those networks, which are given by common carriers. It can be a telephone company or an other organizati

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd