Outliers - reasons for screening data, Advanced Statistics

Assignment Help:

Outliers - Reasons for Screening Data

Outliers are due to data entry errors, subject is not a member of the population that the sample is trying to represent, or the subject is really different. Statistical tests are quite sensitive to outliers so this problem should be addressed.

Univariate outliers are easy to detect (z-scores, box plots, histograms, etc.) standard scores larger than +/-3 are outliers (consider 4 is n>100 or 2.5 if n<10)

Multivariate outliers are difficult to detect. Mahalanobis distance is one powerful technique to use in this case (discussed later). This is evaluated as a chi-square statistic with degrees of freedom equal to number of variables in the analysis. A chi-sqaure statistic value that is significant beyond p<0.001 level determines outliers.

In most cases, it is ok to drop the value from the sample. One can also take steps to reduce the relative influence of outliers if the researcher decides to include the values in the analysis.


Related Discussions:- Outliers - reasons for screening data

Multitrait multi method model (mtmm), Multitrait multi method model (MTMM) ...

Multitrait multi method model (MTMM) is the form of confirmatory factor analysis model in which the different techniques of measurement are used to measure each of the latent vari

Explain healthy worker effect, Healthy worker effect : The occurrence where...

Healthy worker effect : The occurrence whereby employed individuals tend to have lower mortality rates than those who are unemployed. The effect, which can pose the serious problem

Explain yate s'' continuity correction, Yate s' continuity correction : Whe...

Yate s' continuity correction : When the testing for independence in contingency table, a continuous probability distribution, known as chi-squared distribution, is used as the app

Imprecise probabilities, Imprecise probabilities is a n approach used by s...

Imprecise probabilities is a n approach used by soft techniques in which uncertainty is represented by the closed, convex sets of probability distributions and the probability of

Estimating functions, The functions of the data and the parameters of inter...

The functions of the data and the parameters of interest which can be brought in use to conduct inference about the parameters when full distribution of the observations is unknown

Define lagging indicators, Lagging indicators: The part of a collection of...

Lagging indicators: The part of a collection of the economic time series designed to give information about the broad swings in measures of the aggregate economic activity known a

Last observation carried forward, Last observation carried forward is a te...

Last observation carried forward is a technique for replacing the observations of the patients who drop out of the clinical trial carried out over a time period. It consists of su

Explain intervention analysis in time series, Intervention analysis in time...

Intervention analysis in time series : The extension of the autoregressive integrated moving average models applied to time series permitting for the study of the magnitude and str

Quality control procedures, Quality control procedures is the statistical ...

Quality control procedures is the statistical process designed to ensure that the precision and accuracy of, for instance, a laboratory test, are maintained within the acceptable

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd