Statistically modeling, Advanced Statistics

Assignment Help:

A comprehensive regression analysis of the case study London has been carried out to test the 4 assumptions of regression:

1. Variables are normally distributed

2. Linear relationship between the independent and dependent variables

3. Homoscedasticity

4. Variables are measured without error

A preliminary analysis was carried out and there was no missing data or data that was subject to ambiguity. A set of descriptive statistics was generated of the variables Wfood, Totexp, Income, Age and Nk.  The descriptive statistics showed that there were no missing data and there was a small change of the mean and trimmed mean for each variable. Standard deviation and variance for Wfood and nk are low, age is relatively high and totexp and income are the highest. Wfood, totexp, income, age and nk have relatively high coefficients showing dispersion of data.     

The box plots help to identify the quartiles, minimum and maximum values, skewness and their outliers for each variable visually but for nk it is not significant as the data is either 1 or 2. All box plots are positively skewed apart from the nk box plot. Outliers for wfood are 5, totexp are 47, income are 58, age are 27 and none for nk.

The correlation is to see the relationship between all the variables with the Pearson's correlation to see the linear relationship. Multicollinearity is present if Pearson's correlation is greater than 0.9 and that's the case for the following:

  • Wfood and nk (highly correlated)
  • Totexp and income (highly correlated)
  • Totexp and age(highly correlated)
  • Income and age (highly correlated)

In multiple regression equation wfood is the Y dependent variable however totexp, income, age and nk are the X independent variables. The standard error coefficient is close to the coefficient which indicates there is not a vast difference between the coefficient and its actual figure. The goodness of fit fits the multiple regression model as concluded from the hypothesis and at least one slope is not equal to zero. The components have been tested according to the hypothesis with the result as totexp and income not fitting the model but the constant, age and nk fitting the model as they greater than the critical T value.

The VIF shows multicollinearity between the variables and in this case the VIF suggests that wfood is not strongly correlated with other independent variables.  R-squared is relatively low which indicates that there is relatively low variation of wfood (Y) in relation to the linear relationship between the Y and X variables. The adjusted R-squared is a more accurate measure of the goodness of fit of the model and it is always lower than the R-squared. 

The Durban Watson reveals the existence of autocorrelation and as it is 1.98307 there is no autocorrelation or first order autocorrelation. 

The normal probability plot of Anderson Darling, Ryan-Joiner and Kolmogorov-Smirnov show that random errors are not normal distributed and the assumption of normality is satisfied as the probability plot is close to the straight line suggesting linearity of the model.

The histograms show skewness, kurtosis and the distribution of data for each variable graphically. All histograms are positively skewed apart from the nk histogram which is negative. Kurtois is the measure of the flatness of the distribution and Wfood and nk is relatively flat but income, totexp and age is relatively peaked compared to normal distribution.

The Lagrange Multiplier, Whites General, Glejser's and Park tests show that there is heteroscedasticity in the model but the Breusch-Pagan test shows there is no heteroscedasticity. In regards to autocorrelation as there is a large data sample of 1519 it is difficult to determine whether autocorrelation exists on the time series plots but concluded that no autocorrelation is present. D Using a remedial measure of weight least squares still shows that heteroscedasticity exists.

The Cross Correlation for RESI1 shows possibly negative autocorrelation however Autocorrelation Function for RESI1 and Partial Autocorrelation Function for RESI1 show there is no autocorrelation. The LBQ test and LM test also proves that there is no autocorrelation.

The revising of the model had been done in order to see if the assumptions of regression are met. There were 17 clearly visible outliers that were removed gathered from the time series plots and scatter plots and also the variable income was dropped as the results of the best subsets and F-Wald Test indicated it would be a better revision of the model as it was not an influential variable.  

The r-squared was slightly increased which is better but not as much however majority of tests still indicate that there is still heteroscedasticity but no autocorrelation.


Related Discussions:- Statistically modeling

Growth curve analysis, Growth curve analysis is t he general term for metho...

Growth curve analysis is t he general term for methods dealing with development of the individuals over time. A classic instance includes recordings made on a group of children, sa

Describe respondent-driven sampling (rds), Respondent-driven sampling (RDS ...

Respondent-driven sampling (RDS ): The form of snowball sampling which starts with the recruitment of the small number of people in the target population to serve as the seeds. Aft

Compound symmetry, Compound symmetry : The property possessed by the varian...

Compound symmetry : The property possessed by the variance-covariance matrix of the set of multivariate data when its chief diagonal elements are equal to each other, and in additi

Week 5 Assignment 1, Activity Description Create an MS Word document by c...

Activity Description Create an MS Word document by cutting and pasting SPSS output into the document. Complete the following: Use an existing dataset to compute a factorial AN

Linked micro map plot, Linked micro map plot is a plot which provides the ...

Linked micro map plot is a plot which provides the graphical overview and the details for spatially indexed statistical summaries. The plot shows the spatial patterns and statisti

Factorization theorem, The theorem relating structure of the likelihood to ...

The theorem relating structure of the likelihood to the concept of the sufficient statistic. Officially the necessary and sufficient condition which a statistic S be sufficient for

Poisson regression, Poisson regression In case of Poisson regression w...

Poisson regression In case of Poisson regression we use ηi = g(µi) = log(µi) and a variance V ar(Yi) = φµi. The case φ = 1 corresponds to standard Poisson model. Poisson regre

Data theory, Data theory is anxious with how observations are transformed i...

Data theory is anxious with how observations are transformed into data which can be analyzed. Data are thus viewed as the theory laden in the sense that the observations can be giv

Game theory, This is the branch of mathematics which deals with the theory ...

This is the branch of mathematics which deals with the theory of contests between two or more players under the specified sets of rules. The subject supposes a statistical aspect w

Intercropping experiments, Intercropping experiments are the experiments i...

Intercropping experiments are the experiments including growing two or more crops at same time on the same patch of land. The crops are not required to be planted nor harvested at

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd